Google Analytics is a great tool not only for beginners but also for advanced users. Over time, Google is doing a lot for developing better usability and functionality. It is always very difficult to create one tool that will suit all. The reasons are very simple – each person or company has different goals and needs.
I like very much the Google Analytics interface that has changed greatly over the last few years and has been united with other products from Google like Google Tag Manager or Google Data Studio. So, in my view, GA is doing great in terms of usability. Although, if you are growing in maturity and want to automate certain things, Google Analytics has its limitations.
Likewise, advanced statistical analysis or detection of anomalies can not be done directly in Google Analytics. Therefore, clever people from Google have created an API where you can pull your data out of Google Analytics. In my article, I will focus on how to do this through R Programming, especially in RStudio.
What is R Programming
R is the programming language developed by Ross Ihaka and Robert Gentleman in 1993. R has an extensive catalogue of statistical and graphical methods that you simply cannot find in Google Analytics. It includes a machine learning algorithm, a linear regression, a time series, a statistical deduction, anomalies detection and many other uses. R is not only an academic tool. Many large companies use R Programming language, including Uber, Google, Airbnb, Facebook and so on. Nowadays, R Programming is one of the leaders in data analytics and statistics.
If you do not have R Programming yet, you can find it on this link. Installation of R Programming is fast and easy. Just follow the instructions that will appear once you open the downloaded installation file.
If you have not downloaded RStudio yet, you can download it here. RStudio is a user interface for R, and has great functionality and simplifies working with R.
Google Analytics + R = googleAnalyticsR
Now we are slowly getting to the main part of the article; specifically, how to get data from Google Analytics using R. Currently, there are several libraries you can use to download data from Google Analytics.
Those available libraries are:
We will work with the googleAnalyticsR library using the Google Analytics Reporting v4 API.
In order to use googleAnalyticsR, the library needs to be installed first.
The first step is to install.packages command (“googleAnalyticsR”) and then the library command (googleAnalyticsR). These two commands will ensure that you can use the library. If you do not install the library, you could not use the googleAnalyticsR functionality.
As you can see in the picture, there are two commands that secure the authorization with the Google Account (from which you want to pull out the Google Analytics data).
As for googleAnalyticsR, you have to install the googleAuthR library.
If you finished this step, paste this command: ga_auth (new_user = TRUE). This command ensures that a new window opens with your available Google Accounts. Sign in to the account from which you want to mine the data from Google Analytics.
You can now test whether your authorization is correct by using the ga_account_list () command. The easiest thing to do is to create a variable to which we assign a value – in this case, a list of all our accounts.
If you correctly applied the command, you should see your variable called my_accounts on the right side of the RStudio. Now the variable has been assigned to the right side of the Rstudio interface and by clicking on it, you will open the dataframe with your available accounts.
For your first query, you need a specific ID from which you want to extract the data, so you have to look in your list of accounts for the ID you want to use for the analysis.
It looks like this:
In the video below, you can see how to proceed with all the commands mentioned above.
Create the first query
For a start, you need the ID of the view that you want to pull out into R. So you will store the ID of your view in the variable.
viewId <- 123456
Remember to change it for real ID instead of the dummy one I have used above. Now it is important to set the period for which you want to get the data. I prefer a period of at least 3 months, but of course it depends on the type of analysis you want to perform.
yesterday <- Sys.Date () – 1
NdaysAgo <- Sys.Date () – 90
As an example above, we used the Sys.Date function, which will return us the current date (today). We also performed a simple arithmetic function when we reduced 1 from this Sys.Date, so we got the date of the day before (yesterday). The same goes for NdaysAgo, which is reduced by 90 in order to get the 3 months period. The advantage of this is that you do not have to define a date each time, and you will take a period of time every day for the last 90 days.
When everything is ready, you can go ahead and write your own first query! The first query will be simple in order to visualize how to create and submit the correct query.
In the video below, you can see how to proceed with all the commands mentioned above.
NOTE 1: Here you will find a list of all available dimensions and metrics in Google Analytics. Remember, if you use the googleAnalyticsR library, all dimensions and metrics are given without the ga prefix:
NOTE 2: You can also see all dimensions and metrics in the RStudio interface using google_analytics_meta ().
meta <- google_analytics_meta ()
So continue on to your first query 🙂
# Pull the data. This is set to pull the last 90 days of data.
gadata <- google_analytics_4 (view_id,
date_range = c (NdaysAgo, yesterday),
metrics = c (“sessions”),
dimensions = c (“week”, “deviceCategory”),
anti_sample = TRUE)
Each key and value is explained below:
gadata – name of the dataframe
google_analytics_4 – every query needs to start with this function
view_id – id of the view in Google Analytics
date_range = c (NdaysAgo, yesterday) – time frame
metrics = c (“sessions”) – metrics we are going to used
dimensions = c (“week”, “deviceCategory”) – dimensions we are going to used
anti_sample = TRUE) – command for not sampling the data
After putting the above-mentioned query into the command line in RStudio, it performs the pull out of data from Google Analytics API and stores the results in the dataframe called gadata.
It is always very helpful when you use the head () function in order to check whether the data you pulled out from Google Analytics API are correct or if they have all the metrics and dimensions you needed.
Clicking on the gadata dataframe on the right side of RStudio will consequently open a new tab with your dataframe and data pulled out from Google Analytics.
Congratulations, you just created your first query and extracted the data from Google Analytics.
Remember, the principle is always the same, just do not to mix the metrics and dimensions that have different scopes. Sometimes, data may be inaccurate, even though some numbers could be shown.
Data visualization from Google Analytics
Now that you have extracted unsampled data from Google Analytics, it is also important to visualize them. This is how we can get a better understanding of data for a given time period. In R, the ggplot is an unique helper, and if you are (or will be) an enthusiast for R and Google Analytics like me, ggplot will become one of your best friends
What we need at first is to install and load the library. You have to use the same command as in the beginning with googleAnalyticsR, only you use ggplot now.
install.packages ( “ggplot2”)
This command will start and you will get ggplot in your workspace. Now you can make your first visualization of data from Google Analytics using the ggplot library.
# Boxplot visualization bz weeks for last 90 days
gg <- ggplot(gadata, aes(x=week, y=sessions)) + geom_boxplot()
As you enter the above mentioned command into the R console, the program will be created by plotting the following graph. In my case, I used the boxplot because I also wanted to see the differences between the minimum and maximum values for that particular week over the last 3 months.
Now, let’s try to change your first query and try to find the pageviewsPerSession and the number of sessions. After that create the segments based on the device (desktop, mobile, tablet) and then another graph based on the medium dimension. All of this could be done in one script and visualization.
First, you add another metric to the metrics vector – pageviewsPerSession. Next, add a date and a medium dimension to the vector.
# Pull the data. This is set to pull the last 90 days days of data.
gadata <- google_analytics_4(view_id,
date_range = c(NdaysAgo, yesterday),
metrics = c(“sessions”,”pageviewsPerSession”),
dimensions = c(“date”,”medium”,”deviceCategory”),
max = -1)
gadata_viz <- ggplot(gadata10o, aes(x = sessions, y = pageviewsPerSession)) + geom_point() + facet_grid(~deviceCategory)
If you want to create a graph similar to the one above with the medium segment, you do not need to do a new query because you already have the values in the table. Just edit the command in ggplot.
You will use the dataframe gadata again, but this time you will edit the face grid. Not the device category, but the medium. You only edit the part that is colored.
gadata_viz <- ggplot(gadata, aes(x = sessions, y = pageviewsPerSession)) + geom_point() + facet_grid(~deviceCategory)
gadata_viz <- ggplot(gadata10o, aes(x = sessions, y = pageviewsPerSession)) + geom_point() + facet_grid(~medium)
Anyway, this graph is very difficult to read since there are many variables. So let’s make it a little bit more understandable
filtered_data <- gadata10o [gadata10o $ sessions> 30]
As you can see, we are only using a subset of data based on the filter we just created. In this way it’s clearer and easier to read.
The above command helps us filter only channels with more than 30 sessions per day. Of course, the logic of the filter is set by the user based on goals and what needs to be seen.
In this article, we’ve shown how to create a simple query to get unsampled data from Google Analytics. It is important to understand how R works as well as the separate googleAnalyticsR library. If you can correctly construct a query, then it’s up to your need what data you want to extract from Google Analytics and then process or analyze it. The advantage of R and Google Analytics is that we can automate monotone tasks with a few commands. I personally use R and Google Analytics for automated auditing, automated analytics and page performance, as well as automated additions of filters, goals or custom dimensions or metrics. The usability of R with Google Analytics is almost limitless. It could be used from descriptive analytics to customer segmentation. And that’s the reason why we love it :).
If you have any questions, do not hesitate and write to us. We are happy to discuss your point of view.