Tuesday, May 20, 2025

Privacy part 2 - what personal data is stored in Google Analytics?

How does Google Analytics affect the privacy of your audience? That's a good question to ask, not least because there may be legal implications to the answer.

In the first part of this series I looked at where the data on audience members goes. In this part I look at something more basic: what the data is.

As before there is a distinction between what Google know about a member of your audience, and what they let you know about that person. In this case I'm focusing on the latter, because it's hard to know the former (that is true of many companies, not just Google). 

Audience privacy all depends on how Google Analytics is set up.


Google Signals

You know the most about website visitors if you've enabled Google Signals in your Google Analytics (GA) settings. In that case GA will pull info about the audience from the Google accounts they use for their Android phones, their Gmail, their Google Docs, etc. But this only happens if they are logged in at the time of visiting your website, and using the same device.

Of course, a website visitor may use an iPhone, or a Yahoo email address or Microsoft Word. They may not even have a Google account. In that case, turning on Google Signals will not reveal any more information about them. 

When Google Signals is turned on, you see this information about your audience:

  • Age 
  • Gender
  • Interests - for example: 'Food & Dining/Cooking Enthusiasts/Aspiring Chefs'

For quieter websites, thresholding may hide this data about some audience members. I haven't done any testing around that functionality, so I'm unclear how effective it is. 

If you don't enable Google Signals, you'll find the fields listed above are empty in GA:

No data available

Granular location

Have you enabled Granular location and device data collection in the GA property? If so, then GA will store the city of website visitors. They label this 'city', and it can be that. But, it can also be a much smaller entity. For example, I've seen a UK village listed which has a population of 6,000. 

So, where a user lives affects how much privacy they are afforded by Google Analytics. Or does it? I say that because city seems to correspond to the location given by the Internet Service Provider (ISP) of the audience member. I've seen ISPs describe location accurately. I've also seen them give a location 30 miles away from the actual location of the user.


Data stored as standard

If neither of the above settings is enabled, then Google will show this information about visitors to your website:

  • Region (for example Florida )
  • Country 
  • Language


Data inferred from user actions

It might be possible to learn about a website visitor from their actions. A website visitor who visits a page designed for gambling addicts may be a gambling addict. Or they might just be interested in the subject.

You might have shared a page address with only a small group of people, and it may not be possible to get there without having the page address. In that case you would know that website visitors are one of that small group. 



Can you identify a website visitor?

In most cases it's not possible for you to identify an audience member. However, if a website has a low level of traffic it is possible to make an educated guess in combination with other information. 

Here's an example: imagine Murali is someone you met at an event last month. He said he was from Market Harborough in Leicestershire, UK. You check your stats this month and you see that you've had a visitor from Market Harborough. Is that the same person? 

If your website is quiet - say you get 100 visitors a month and only 5 are from the UK - then it's very likely to be the same person. But if you get 10,000 visitors a month from the UK, then you couldn't say that. 

Either way you could never prove it was Murali who visited.   


More Google Analytics posts

Privacy part 1 - where the data goes in Google Analytics 4

Can Google Analytics give an early warning of going viral?


Tuesday, April 1, 2025

The danger of the Realtime Overview

I have a love/hate relationship with the real-time data in Google Analytics. 

I love the immediacy: people are here right now! This person from France just arrived on the website. Here is someone else: they're from the US. That user is sticking around a while. Do they like what they see?

Realtime overview report

Also, the Realtime Overview report is great for checking an account is working at its basic level: visit your website, head over to Google Analytics, are you counted?

Sadly, my brain tells me that my love of real-time data is superficial. The reason? The numbers draw me in too close: I lose perspective.

I know one website that gets around 800 active users per day during the week. I kept a close watch on it during a 24-hour period last week. For the Active users in the last 30 minutes measure, the highest number I saw was 33, while the lowest was 14.

That exposes a problem: when I look at the Realtime Overview, am I seeing my website at its busiest moment, or its quietest moment? I just don't know.

Moreover, while it's great to see the last 30 minutes, there's a measurement gap here. What happened 44 minutes ago? Or 3 hours back? That data is hidden away until tomorrow arrives.

When I experienced a viral moment at the University of Oxford, I vividly remember watching the real-time number ebb and flow. It was a fruitless exercise: I had nothing to compare it with.


More Google Analytics posts

What does 'Email' mean in Google Analytics, and why are those numbers so small?  

Should I care about average engagement time?



Tuesday, February 18, 2025

Can Google Analytics give an early warning of going viral?

Some years ago I found myself in the middle of a viral moment. It was a wild ride, involving a lion, Jimmy Kimmel and a world-famous university.

I worked for the University of Oxford at the time. My department, the central fundraising office, ran a central donation platform. Most of the colleges and departments had pages on the website to accept gifts. 

I happened to share an office with the staff who did the admin that accompanied donations. One afternoon, by pure chance, I happened to overhear two of them discussing how busy things were that day. My week rapidly unravelled.

It turns out that the illegal hunting of a lion had made the news around the globe. But this lion, known as Cecil, had long been monitored by the Conservation Unit at the University of Oxford. The head of their department somehow got invited onto the Jimmy Kimmel show to talk about the event. 

Lots of viewers were moved by the segment and wanted to give. And our donation platform was their destination. That day we had a 7,000% increase in the number of donations. It was a wild time, as we tried to keep the website up and make the most of the viral moment.

As I reflect, I often think - how could I get an early warning of this sort of event? Time is so critical when you have a large traffic surge. You might only have minutes before your website goes under. In this case I knew quickly because I happened to share an office with some staff from finance.

Could Google Analytics help me with this?

The best way I have found is to use Custom Insights. Sadly, there isn't enough space to provide a full guide to the feature here. Here is Google's guide to Custom insights.

For this particular purpose, the key was to set the evaluation frequency to Hourly. My tests have found that there's a sizeable delay for Daily frequency: email notifications come through 11-28 hours after the end of the day. That's far too long for this sort of situation.

By contrast, a Custom Insight based on Hourly frequency usually delivers results an hour after the time period concerned. For example, an insight from the period 9-10am on a particular day gets delivered by email at 11am that day. 

What metric do you use for the insight? In a viral situation I'm most concerned about the amount of work for the server. Will it fail? I think the best metric for this situation is Views. If you are cloud hosted, and are confident in your host's ability to scale up, then you may pick a metric based on maximising impact.

Hourly traffic is unusual: it currently doesn't show up anywhere in Google Analytics. So I had to experiment to set this up. To begin with I wanted to trigger the insight frequently, so I picked a low value for my test-site: 50 views an hour. Then I waited for the first notification. That came, so I was comfortable the mechanism was working. 

Over a period of a week I then boosted the trigger value to: 100, 200, 300 views. I kept going until I got to a level that wasn't reached with typical traffic fluctuations.

The currently value should give me a useful early warning for a viral surge. What would I do then? Probably all of these things:

  • Go to the Realtime reports to identify some details: Where is the traffic going? Which part of the world is it coming from? 
  • Warn my hosting provider and ask if they can increase the resources to the server.
  • Prepare my outage pages for the worst. Can I point to an alternative location, such as a social media post, that will help the users complete their task if the website fails?

If you're paying close attention, you'll notice that I said 'usually'. Why was that? Well, my experiments have found that Hourly Custom Insights aren't triggered in a portion of cases - about 7%. I guess one has to see this approach as a helpful aid rather than a fool proof system.

Can you see a better way to do this in Google Analytics, or another package? Do let me know via Blue Sky or Threads.


Get the latest

Sign-up to the email list to hear about new posts.