Spam in Google Analytics is now common, high volume, and a serious nuisance. But most of it is easy to defeat, if you know how.
Why this article?
I’m not the first to write about this. There are several excellent articles that explain what the spam is, and what to do about it, such as https://moz.com/blog/stop-ghost-spam-in-google-analytics-with-one-filter and https://blog.kissmetrics.com/removing-google-analytics-referral-spam/. However some of these are a little out of date. For example, you might be advised to watch for new sessions with one page view; spammers are now more sneaky and most of the spam I see no longer fits that profile.
Although there’s good information to be found, many Analytics accounts still suffer from Analytics spam. By explaining using different words, I hope this article will help you understand what to do.
What is it?
Analytics spam is data in your Google Analytics reports that isn’t from genuine website visitors.
There’s a lot of reasons spammers do this. Perhaps to generate income from fake affiliate traffic.
Why does it matter?
If you use Google Analytics to help make business decisions, you should care about analytics spam. It makes your website look busier than it really is, and makes you think traffic is coming from places it isn’t.
Recently I’ve seen a lot of spam traffic purporting to be organic traffic from Google. On one website, more than 60% of the traffic labelled as organic and from Google was actually spam traffic. Yes, that website was actually receiving less than half the organic traffic the owner thought.
How does it happen?
Spam traffic can be split into two groups: ghost spam, and the rest. Most of your spam traffic, typically 95% or more, is likely to be ghost spam.
Ghost spam is created by a program that sends data directly to Google Analytics, without ever visiting your website. So changing settings on your website won’t affect ghost spam; don’t bother editing .htaccess or anything like that in an attempt to block it.
The other type of spam, commonly known as referral spam, does indeed visit your website (ghost spam can look like referral spam if you don’t know what to look for). But even for spam traffic that does visit your website, blocking at the website is usually the wrong approach. Blocking visitors using .htaccess or a similar mechanism will slow down your website for genuine visitors, potentially more so than simply accepting the spam traffic. The reasons are technical, so I won’t elaborate here. Contact me if you’d like more detail about why. (Hint: the Apache documentation recommends avoiding .htaccess if you can).
What to do?
The good news is that removing future ghost spam is easy to do. One Google Analytics filter will do it.
The instructions at https://moz.com/blog/stop-ghost-spam-in-google-analytics-with-one-filter under the heading You only need one filter to deal with ghost spam will, in one fell swoop, remove all of your future ghost spam from your filtered Analytics view.
Be very careful you follow the instructions correctly, especially when creating the regular expression. A single minor typo here can prevent the filter working as intended. In particular, make sure you put the . (periods), | (pipes) and \ (backslashes) in the right places.
|Don’t be tempted to create multiple Include Hostname filters. It won’t work. You must have one, and only one, Include Hostname filter.|
Will this remove all spam?
No. This one filter won’t remove non-ghost spam. However once this filter has had time to work, any spam that remains is likely to be all referral spam. With a little bit more work, you can block that as well using Analytics filters.
But I’m still seeing lots of spam!
Then you’ve almost certainly got a typo somewhere in your ghost spam filter, or you’ve mis-applied it to your view.
An easy way to confirm this is to list the hostnames in Google Analytics.
On the left side of the Reporting section for your filtered view, choose Audience then Technology and Network.
Now choose Hostname as the primary dimension, just above the table.
Ensure the date range (top right) starts the day after you added your ghost spam filter.
The list then displayed by Analytics should contain only website domains you are collecting Analytics data for. That may be your domain (e.g. example.com) only.
Repeat the same steps with your unfiltered view. This time you will probably see lots of unrecognised domains. That is the ghost spam your filter is blocking.
If the filtered list includes unexpected domains or not set, there is something wrong with your spam filter (or in the case of not set, your event tracking). Compare it carefully to the original instructions. If there are desired domains listed in the unfiltered list that are not included in the filtered list, then you need to add them to the regular expression of your filter.