Monday, July 7, 2025

Hey Google Analytics, when you refer to a referral, what do you mean?

Where do people come from? That's a key question when exploring the performance of your website. 

In earlier posts I've explained the organic search and email channels on Google Analytics' Acquisition Report. Today, I want to unpack another channel: Referral. 

List of acquisition channels in GA

So, in the eyes of Google Analytics, what are referrals?

The Acquisition Report breaks things down by sessions and by users. For brevity I'm only going to talk about sessions in the explanation that follows. All principles discussed will apply to users as well.


The quick answer

Google Analytics lists a session as a referral when the user has come from another website.


The detailed answer

An example always helps, I think. Here goes...

Trey lives in Bournemouth in the UK. He saw an article on a local news website about a local company that had raised £10m for a charity called Water Aid. Trey followed a link in the article that led him to the Water Aid website. He spent some time exploring the work of the charity. 

Next month a Water Aid staff member looked at Google Analytics (GA) for their website. In the Acquisition Report GA listed 3,000 sessions in the Referral Channel. One of those was Trey's visit.

Simple, right? 

Not so fast.

If we look more closely we discover that the Referral Channel can also mean some other things. 


Traffic from AI

The numbers for referrals also include sessions that come from AI chatbots, such as:

That feels different to me. When I'm interacting with an AI chatbot I tend to forget I'm on a website. 

I think AI traffic is important to watch, because of predictions large language models will change the way people look for information. I think it's a good idea to customise the channel group in GA to list AI traffic sources separately from referrals.

 

Some traffic from social media

I've seen some traffic from Bluesky and Threads logged in the Referral Channel of GA rather than the Organic Social channel. It's not all traffic from those social networks, just a portion.

In the case of Bluesky, the misplaced traffic has the following source dimension:

go.bsky.app 

In the case of Threads, the traffic has this source dimension:

l.threads.com 

These dimensions may provide a way to adjust the channel grouping and get more accurate data.



More on Google Analytics

So, what does Organic Search mean?

What does 'Email' mean in Google Analytics, and why are those numbers so small?

Tuesday, May 20, 2025

Privacy part 2 - what personal data is stored in Google Analytics?

How does Google Analytics affect the privacy of your audience? That's a good question to ask, not least because there may be legal implications to the answer.

In the first part of this series I looked at where the data on audience members goes. In this part I look at something more basic: what the data is.

As before there is a distinction between what Google know about a member of your audience, and what they let you know about that person. In this case I'm focusing on the latter, because it's hard to know the former (that is true of many companies, not just Google). 

Audience privacy all depends on how Google Analytics is set up.


Google Signals

You know the most about website visitors if you've enabled Google Signals in your Google Analytics (GA) settings. In that case GA will pull info about the audience from the Google accounts they use for their Android phones, their Gmail, their Google Docs, etc. But this only happens if they are logged in at the time of visiting your website, and using the same device.

Of course, a website visitor may use an iPhone, or a Yahoo email address or Microsoft Word. They may not even have a Google account. In that case, turning on Google Signals will not reveal any more information about them. 

When Google Signals is turned on, you see this information about your audience:

  • Age 
  • Gender
  • Interests - for example: 'Food & Dining/Cooking Enthusiasts/Aspiring Chefs'

For quieter websites, thresholding may hide this data about some audience members. I haven't done any testing around that functionality, so I'm unclear how effective it is. 

If you don't enable Google Signals, you'll find the fields listed above are empty in GA:

No data available

Granular location

Have you enabled Granular location and device data collection in the GA property? If so, then GA will store the city of website visitors. They label this 'city', and it can be that. But, it can also be a much smaller entity. For example, I've seen a UK village listed which has a population of 6,000. 

So, where a user lives affects how much privacy they are afforded by Google Analytics. Or does it? I say that because city seems to correspond to the location given by the Internet Service Provider (ISP) of the audience member. I've seen ISPs describe location accurately. I've also seen them give a location 30 miles away from the actual location of the user.


Data stored as standard

If neither of the above settings is enabled, then Google will show this information about visitors to your website:

  • Region (for example Florida )
  • Country 
  • Language


Data inferred from user actions

It might be possible to learn about a website visitor from their actions. A website visitor who visits a page designed for gambling addicts may be a gambling addict. Or they might just be interested in the subject.

You might have shared a page address with only a small group of people, and it may not be possible to get there without having the page address. In that case you would know that website visitors are one of that small group. 



Can you identify a website visitor?

In most cases it's not possible for you to identify an audience member. However, if a website has a low level of traffic it is possible to make an educated guess in combination with other information. 

Here's an example: imagine Murali is someone you met at an event last month. He said he was from Market Harborough in Leicestershire, UK. You check your stats this month and you see that you've had a visitor from Market Harborough. Is that the same person? 

If your website is quiet - say you get 100 visitors a month and only 5 are from the UK - then it's very likely to be the same person. But if you get 10,000 visitors a month from the UK, then you couldn't say that. 

Either way you could never prove it was Murali who visited.   


More Google Analytics posts

Privacy part 1 - where the data goes in Google Analytics 4

Can Google Analytics give an early warning of going viral?


Tuesday, April 1, 2025

The danger of the Realtime Overview

I have a love/hate relationship with the real-time data in Google Analytics. 

I love the immediacy: people are here right now! This person from France just arrived on the website. Here is someone else: they're from the US. That user is sticking around a while. Do they like what they see?

Realtime overview report

Also, the Realtime Overview report is great for checking an account is working at its basic level: visit your website, head over to Google Analytics, are you counted?

Sadly, my brain tells me that my love of real-time data is superficial. The reason? The numbers draw me in too close: I lose perspective.

I know one website that gets around 800 active users per day during the week. I kept a close watch on it during a 24-hour period last week. For the Active users in the last 30 minutes measure, the highest number I saw was 33, while the lowest was 14.

That exposes a problem: when I look at the Realtime Overview, am I seeing my website at its busiest moment, or its quietest moment? I just don't know.

Moreover, while it's great to see the last 30 minutes, there's a measurement gap here. What happened 44 minutes ago? Or 3 hours back? That data is hidden away until tomorrow arrives.

When I experienced a viral moment at the University of Oxford, I vividly remember watching the real-time number ebb and flow. It was a fruitless exercise: I had nothing to compare it with.


More Google Analytics posts

What does 'Email' mean in Google Analytics, and why are those numbers so small?  

Should I care about average engagement time?