Your Analysis is Only as Good as Your Data:Cleaning Up Your Google Analytics Account in 2020
When was the last time you checked your air filter? If you’re like me, an individual who can’t find enough time to finish his laundry basket nonetheless change his air filters, then you probably change it once every few months (few months = variable term of 2-6 months). Despite this, popular companies that offer HVAC products or services recommend checking your filters as often as every 30 days based on how many pets are in the home and allergy severity.
Your Google Analytics account is no different.
As a search engine optimizer, I’ve dove into Google Analytics (GA) accounts for businesses spanning sportswear, outdoor shopping malls, software, and even heating and air. What I’ve learned is that no Google Analytics account is the same, but there are commonalities, one being that there are always opportunities to improve the quality of data being analyzed.
The old paradigm, “Your analysis is only as good as your data,” applies to marketing in that Google Analytics isn’t a set it a forget about type of tool just like if you leave your air filters for 12 months, you can expect to be breathing in 12 new species of mold.
All good GA users must determine if their account is a victim of dirty data, simply provides a suitable view of data or is producing accurate, trustworthy data to base your assumptions off of.
In this post, we’re going to dive into a few common ways that dirty data can deceive you and explain how to get rid of that inflated traffic with the right filters. Furthermore, we’re going to talk about cross-domain tracking and how your Analytics account might not only be inflating your metrics, but it can be missing the entire story.
Cleaning Up Spam
Don’t you hate it when you’ve visited your favorite news or magazine website and suddenly a gigantic banner ad stating you’ve won $1 million pops up or redirects you to some unsecure site? Nobody likes spam, but few GA users realize that spam doesn’t just affect their website browsing experience – it can affect their GA data as well.
First off - Good Bots vs. Spam Bots
I’d be remiss if I didn’t talk about bots first here. Bots are internet software that run scripts to quickly perform repetitive tasks. There are good bots that index sites for search engines or gather other types of information. There are also malicious bots that are better known as malware that gain access to passwords, can obtain financial account information or, in our case, input spam into our database. Think of bots as the first wave of spam to hit your account. Thankfully, filtering out bot traffic can be a fairly easy process. Every GA account user should enable bot filtering in their production view just to ensure that bot-enabled traffic doesn’t affect your data going forward.
An important tip to remember is that, while powerful, this checkbox will not filter out all bot traffic. The bot filtering checkbox will filter out all bots that are known to Google. So if you’re bot traffic happens to be new, it will not be filtered out immediately and will require further filtering.
Spam data in GA can appear in the language report, page title report, and numerous other places, but it’s most commonly found in the referral traffic report. A lot of these URLs are simply spammers trying to deceive you by linking to your site from URLs that mimic that of well-known URLs (i.e. google.com vs. gouogle.com or yahoo.com vs. yakoo.com) with intent of injecting your site with malicious scripts.
Taking care of the rest of your spam traffic is as easy as creating an exclude bot traffic filter above with a little help from regex. Let’s look at this example a little closer. Let’s say you look at your referral traffic and you notice there are several URLs that are attempting to insert bad data into your GA account. You notice the three culprits are:
In order to filter traffic from those specific URLs, we can use regular expressions (regex) to pinpoint these URLs as source traffic and ensure they never show as referral traffic in our reports. If you’re wondering, “What in the world is regex?”, I haven’t forgotten that we have yet to take a deep dive, but let’s dip our toe in the water first.
Regex, simply put, is a way for GA to recognize a pattern and filter numerous things without getting repetitive. Three basic expressions in regex are the pipe (|), dollar sign ($) and caret (^), which all allow us to build expressions with a beginning, ending and “or”. Let’s see below
· | = Signals in the expression that this is an “or” statement
· $ = Signals in the expression that it’s ended
· ^ = Signals that you’re beginning a new pattern
In our spam filter example above, we recognized that the referral traffic that was problematic is youhoo.com, gooogle.com, and gougle.com, so we created a filter that recognizes youhoo.com “OR” gooogle.com “OR” gougle.com as a campaign sources (referral traffic sources) and asks GA to refrain from including that data.
Make sense? If it doesn’t, don’t worry too much as we’ll be taking a much deeper dive into the world of regex in a few posts – stay tuned!
The point is that filters like these are commonplace in well-established GA accounts and will help us as we look into further data conundrums you might face.
It’s You! You’re Skewing Your Data
When was the last time you visited your site, and furthermore what type of site do you run? Is it an e-commerce site? Do people need to login to your site? Does your team use a sandbox site before pushing through changes? Does your sales team show data from your content to potential prospects during sales meetings? Why are you asking me all these questions, Joseph?
The reason is because YOU could be the culprit that’s skewing your data. That’s right, I said you.
Internal traffic, which is traffic that is directed to your site from your team, sales, developers or anyone from your company, can inevitably skew your data. Now if you’re a one-man or one-woman shop and you haven’t touched your site’s infrastructure in a few months then this won’t affect you too badly. If you’re a small business or a big business with a large digital footprint, then your team can indirectly make you misinterpret data. Two common filters to use for these cases are IP address filters and city or state filters. Let’s look at a few more examples.
Let’s say you’re a flower shop that exclusively offers flowers in the state of Illinois. Business is booming lately, and as we’re all facing this pandemic, people want to send flowers to loved ones to put a smile on their face (how nice). However, you and your wife can’t handle the influx of calls while delivering flowers in the area, so you outsource duties to a few sales members across the U.S. in an effort to help create jobs and decrease a bit of stress as well. Those salespeople are located in California, Michigan and Florida – specifically Los Angeles, Ann Arbor and Tallahassee. In order to get the pricing correct, your three employees have to visit the site constantly, which inflates your engagement metrics across the board. So what do you do? Well, since you’re only selling to individuals in the state of Illinois and 99 percent of your calls come from within the state, you can create a filter that excludes traffic from the three areas your salespeople live in. This way when you go back into your GA account at the end of the week to figure out what specials you should offer based off of engagement, you’re not simply looking at the engagement of your three employees.
IP Address Filters
Now the city example might not always work, especially if your company has multiple locations or is willing to take clients from across the country. So the next best option is public IP addresses. The setup is similar to that of a city filter, except you replace the filter field with IP addresses and list the IP addresses of your salespeople. Problems tend to arise when IP addresses are changed frequently by internet providers or the amount of IP addresses to add to the list becomes too difficult to track, but for small businesses this is typically a relatively easy fix.
Connecting the dots with cross-domain tracking (define cross-domain tracking, set up an example, put referral exclusion list down there to explain let’s ignore these domains and treat them as the same domain): I wanted to quickly touch on this before detailing this more next week. Several sites utilize third-party software to function properly. Take PayPal for example. On several e-commerce sites throughout the country, site users will go to their cart to pay only to be redirected to PayPal or a similar site payment site and eventually come back to the original site’s thank you page. Now the user either doesn’t even know they left the original site to begin with or they understand that they’ll come back to the original site and it’s just third-party software.
The problem is that Google Analytics does not.
You see once an individual leaves the site to go to a different URL, GA will automatically end that session under the assumption that the person left the site, instead of entering the cart. To further complicate things, when they enter back onto the site they can come back as referral traffic from paypal.com under a new user ID from the first visit as well. Does your head hurt yet? Mine does.
Cross-domain tracking solves this problem by allowing GA users to classify that PayPal transaction or other site visits that have to do with your site (think YouTube, Facebook, cart pages) are linear and not one-off visits. But, as I said, we'll dive deeper into that next week!
In the meantime, what filters are you using on your account to clean up your data? Are you testing your filters in a test view or are you using that dread “all website traffic” view? Still feeling confused on regex? We’d love to hear how you’re optimizing your Analytics account to better suit your business. Thanks for reading and I look forward to sharing more insights next week!
This post was created in an effort to complete my CXL Institute Mindegree Scholarship obligation and speak to the materials reviewed in the course. The information is a combination of my previous knowledge and excellent insights from a phenomenal program.