The raw numbers presented in Google’s Search Analytics (part of Search Console) are often claimed to be untrustworthy. Let’s look at why this might be.
Validity of the raw numbers
People distrust the raw numbers mainly because the numbers they see don’t make sense to them, or don’t tell them what they’d like to know. However I see no evidence of the numbers being "cooked" by Google, with two important exceptions.
First, Google clearly state (under the heading Data discrepancies) they remove the detail for queries that might expose private information. Anything that looks like a phone or credit card number is an obvious candidate for exclusion. The quoted totals will still include these searches; it’s just the detail that’s removed.
They also attempt to remove data from automated searches by bots, though I suspect Google is not very successful with that.
The second exception is when there’s a bug. Search Analytics is software. Software has bugs. When bugs are found they’re (hopefully) fixed. A fix can change (correct) the data displayed. However there’s no need to invoke conspiracy theories to explain these changes. Anyone who’s dealt with software development knows exactly what I mean.
Why else might people think the numbers are wrong? In essence, I believe it’s because some people don’t understand how complex the raw data actually is, or they don’t allow for how some people actually use Google search, or they haven’t allowed for how their website is configured.
If you are looking at genuinely raw data, then the numbers should always make sense unless there really is a problem. But Search Analytics is not displaying the genuinely raw data. At minimum it shows a combined subset for a day, based on the settings we select. If we don’t understand how the original raw data has been combined, then we may conclude the data is wrong.
Google provides an interesting starting point to understanding how and when the raw data is combined. For example, they explain how the CTR calculation depends on whether you group your data by Page or Query. The apparent discrepancy is a result of the mathematics, not a Google conspiracy.
Summarising data, especially using averages as Search Console does, also has the potential to mislead. Once again, mathematics is the cause, not a Google conspiracy.
Real people use Google search - sometimes
I’ve heard several times the statement "no one ever looks at page 20 in a Google search". Search Analytics often shows impressions at positions such as 254. Some people regard this as evidence that the numbers must be wrong, as that couldn’t really happen.
There are two things those people are forgetting.
First, I have indeed personally looked at page 20 in a Google search. Many times. There are multiple reasons why I might do this. Perhaps I am desperate to find a specific web page, and so I just keep looking. More commonly however I’m researching a client’s Internet presence.
And then there are all those automated tools that I wrote about previously that generate ranking reports (see Why keyword ranking reports are lies). They keep looking, page after page, until they find the target website. Many will look for the target website up to page 30 or more in the results before deciding it’s not listed. Although Google does attempt to remove searches by bots, the point of these searches is to pretend to be humans, so naturally these searches will often be included in your Search Analytics data.
It seems reasonable to assume that the more popular your target search term is for SEO, the more searches by these bots you’ll have in your Search Analytics.
Search Analytics is fairly simplistic in its definition of a website, and this understanding may not match your expectations.
The simplest example is when you register your website http://www.example.com with Search Analytics, but some or all pages are also indexed and available under http://example.com or https://www.example.com or https://example.com. In this situation, Search Analytics will show you only the data it believes relates to http://www.example.com, while you may be expecting to see the combined data for all four variants.
If you also have redirects in place between some of those four website variants, Search Analytics may relate some data to a variant you aren’t expecting, so you see lower numbers than you expect. Nothing’s missing, it just isn’t where you’re looking for it.
Although it may not be obvious, Search Analytics is reporting on a complicated system (your website) and the data is complex. In addition, the data is summarised and sometimes averaged. Often it’s not easy to work out what’s really going on with the data, which leads to the impression the data is wrong.
So far, every time we’ve seen something apparently "wrong" with the data, further investigation has revealed the data is actually correct, we were just reading it wrong, or not understanding how Google’s Search was being used.
Over the next few months I will publish in this blog examples of actual cases where an initial look at the data has suggested one thing, but a more in depth look showed the original conclusion to be wrong.