Tracking Blog Performance in R Using Google Analytics
Written on
Chapter 1: Introduction to Blog Performance Tracking
Understanding the performance of your blog or website is crucial for improvement. In this guide, we will explore how to utilize the {googleAnalyticsR} package in R to analyze metrics such as page views, sessions, users, and engagement.
The Stats and R blog has reached its one-year milestone since its launch on December 16, 2019. In this article, I’ll share insights from the first year, including the benefits of maintaining a technical blog, and showcase data analysis using the {googleAnalyticsR} package.
This will also serve as a reflection on content creation, distribution, and future plans, as well as an insight into my journey as a data science blogger.
For several reasons, I've chosen to present my analytics through the {googleAnalyticsR} package rather than using the standard Google Analytics dashboards:
- I want to introduce the capabilities of the {googleAnalyticsR} package to marketers familiar with R, encouraging them to enhance their dashboards with R-generated visualizations.
- Automation is key; by using R, I can easily replicate analyses yearly, allowing me to track the blog's evolution over time.
- While I am not a marketing expert, I hope to provide useful ideas for data analysts, SEO specialists, and fellow bloggers interested in tracking their blog’s performance using R.
Before diving in, it’s worth noting that I do not rely on my blog for income. My writing would likely differ if it were my primary source of revenue.
Prerequisites for Using Google Analytics in R
To get started with the {googleAnalyticsR} package, we need to install and load it in R:
# install.packages('googleAnalyticsR', dependencies = TRUE)
library(googleAnalyticsR)
Next, we must authorize access to our Google Analytics account using the ga_auth() function:
ga_auth()
This command will prompt a browser window for authorization, saving an access token for future use. Please ensure this function is executed in an R script rather than an R Markdown document.
After authorization, we need the ID of the Google Analytics account we wish to access. Use the following command to list all available accounts associated with your email:
accounts <- ga_account_list()
accounts
As illustrated, I have two accounts tied to my Google Analytics profile—one for my personal site and another for this blog. Ensure to select the appropriate account linked to your blog:
view_id <- accounts$viewId[which(accounts$webPropertyName == "statsandr.com")]
With these preliminary steps completed, we can now delve into analyzing Google Analytics data in R.
Analytics Overview: Users, Page Views, and Sessions
To start, we will examine some basic metrics like the total number of users, sessions, and page views for the entire site over the past year:
# Set the date range
start_date <- as.Date("2019-12-16")
end_date <- as.Date("2020-12-15")
# Fetch Google Analytics data
gadata <- google_analytics(view_id,
date_range = c(start_date, end_date),
metrics = c("users", "sessions", "pageviews"),
anti_sample = TRUE
)
gadata
In its inaugural year, Stats and R attracted 321,940 users, resulting in 428,217 sessions and 560,491 page views.
For those unfamiliar with Google Analytics terminology:
- A user refers to both new and returning visitors within a specified time frame.
- A session is defined as a series of interactions a user has with your site in a given time period.
- A page view counts every instance a page is loaded or reloaded.
For example, if User A reads three blog posts and then leaves, while User B reads one blog post and the about page before exiting, Google Analytics will report two users, two sessions, and five page views.
Sessions Over Time
Next, let’s visualize the daily session count over time using a scatterplot:
# Retrieve Google Analytics data for sessions over time
gadata <- google_analytics(view_id,
date_range = c(start_date, end_date),
metrics = c("sessions"),
dimensions = c("date"),
anti_sample = TRUE
)
# Load necessary libraries
library(dplyr)
library(ggplot2)
# Create scatter plot with trend line
gadata %>%
ggplot(aes(x = date, y = sessions)) +
geom_point(size = 1L, color = "steelblue") +
geom_smooth(color = "darkgrey", alpha = 0.25) +
theme_minimal() +
labs(
y = "Sessions",
x = "",
title = "Daily Session Evolution",
subtitle = paste0(format(start_date, "%b %d, %Y"), " to ", format(end_date, "%b %d, %Y")),
caption = "Data: Google Analytics data from statsandr.com"
)
The visualization reveals a significant traffic spike around late April, attributed to the viral blog post "A package to download free Springer books during Covid-19 quarantine." Following this peak, the daily session count stabilized, with an upward trend observed in the months leading up to the end of the data range.
Sessions by Channel
Understanding how visitors reach your blog is vital. Here’s how to visualize daily sessions by channel:
# Retrieve data for sessions by channel
trend_data <- google_analytics(view_id,
date_range = c(start_date, end_date),
dimensions = c("date"),
metrics = "sessions",
pivots = pivot_ga4("medium", "sessions"),
anti_sample = TRUE
)
# Rename variables for clarity
names(trend_data) <- c("Date", "Total", "Organic", "Referral", "Direct", "Email", "Social")
# Transform data into long format
library(tidyr)
trend_long <- gather(trend_data, Channel, Sessions, -Date)
# Create line plot
ggplot(trend_long, aes(x = Date, y = Sessions, group = Channel)) +
theme_minimal() +
geom_line(aes(colour = Channel)) +
labs(
y = "Sessions",
x = "",
title = "Daily Sessions by Channel",
subtitle = paste0(format(start_date, "%b %d, %Y"), " to ", format(end_date, "%b %d, %Y")),
caption = "Data: Google Analytics data from statsandr.com"
)
The results show a predominant share of traffic stemming from organic search, indicating that many visitors find the blog through search engine queries. This is expected given the tutorial nature of the content.
Sessions by Day of the Week
To examine traffic patterns based on the day of the week, we can create a boxplot:
# Retrieve data for sessions by day of the week
gadata <- google_analytics(view_id,
date_range = c(start_date, end_date),
metrics = "sessions",
dimensions = c("dayOfWeek", "date"),
anti_sample = TRUE
)
# Recode dayOfWeek variable
gadata$dayOfWeek <- recode_factor(gadata$dayOfWeek,
"0" = "Sunday",
"1" = "Monday",
"2" = "Tuesday",
"3" = "Wednesday",
"4" = "Thursday",
"5" = "Friday",
"6" = "Saturday"
)
# Reorder dayOfWeek for better visualization
gadata$dayOfWeek <- factor(gadata$dayOfWeek,
levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
)
# Boxplot
gadata %>%
ggplot(aes(x = dayOfWeek, y = sessions)) +
geom_boxplot() +
theme_minimal() +
labs(
y = "Sessions",
x = "",
title = "Sessions by Day of the Week",
subtitle = paste0(format(start_date, "%b %d, %Y"), " to ", format(end_date, "%b %d, %Y")),
caption = "Data: Google Analytics data from statsandr.com"
)
The boxplot reveals interesting outliers, likely due to the viral post mentioned earlier. After filtering, the analysis indicates that Wednesdays tend to have the highest median session counts, while weekends attract fewer visitors.
Sessions by Device Type
Finally, understanding how users engage with your content across different devices can provide valuable insights. We’ll generate three charts focusing on session counts, average time on page, and page views per session by device type.
# Fetch Google Analytics data by device category
gadata <- google_analytics(view_id,
date_range = c(start_date, end_date),
metrics = c("sessions", "avgTimeOnPage"),
dimensions = c("date", "deviceCategory"),
anti_sample = TRUE
)
# Plot sessions by device category
gadata %>%
ggplot(aes(deviceCategory, sessions)) +
geom_bar(aes(fill = deviceCategory), stat = "identity") +
theme_minimal() +
labs(
y = "Sessions",
x = "",
title = "Sessions by Device Type",
subtitle = paste0(format(start_date, "%b %d, %Y"), " to ", format(end_date, "%b %d, %Y")),
caption = "Data: Google Analytics data from statsandr.com"
)
The analysis reveals that most visitors accessed the blog via desktop, while mobile usage was relatively lower.
Understanding these metrics not only helps in tracking the blog’s performance but also guides future content strategies and improvements.
This video offers a detailed explanation on how to measure blog post traffic using Google Analytics, providing additional insights on tracking performance effectively.
Here, you can learn how to create a blog or website in R using the {Distill} package and deploy it continuously via GitHub and Netlify.
With this foundation, you are now equipped to analyze your blog's performance using R and the {googleAnalyticsR} package. I encourage you to explore these tools further and adapt them to your specific needs. If you have any questions or want to share your experiences, feel free to comment below!
Thank you for reading, and I look forward to connecting with fellow bloggers and data enthusiasts in the future.