How To Exclude Bot Traffic From Your Google Analytics

Eduardo Rocha Senior Sales Engineer and Security Analyst
7 Min read

Data is the cornerstone of all online marketing and business decisions. Proper understanding of visitors and market trends, behaviour and industry shifts – it all comes down to data driven insights. And those insights can only be as good as your data

In order to gather data we use a variety of monitoring and analytical tools, and one such widely used is definitely Google Analytics (GA). It helps gather intel about traffic, users, channels and behaviour, making it the source of all the main website metrics. But if you’ve used it, you certainly noticed some data simply doesn’t add up, and you rightfully doubt the legitimacy of it.

How One AI-Driven Media Platform Cut EBS Costs for AWS ASGs by 48%

How One AI-Driven Media Platform Cut EBS Costs for AWS ASGs by 48%

The most frequent source of inaccurate data picked up by your GA is bot traffic. This should come as no surprise considering that over half of the Internet’s traffic is made by bots, according to a report by Imperva Incapsula. Bot traffic that gets picked up by analytical tools can end up skewing your data reports, which can lead to wrong assumptions and conclusions, impact your site performance and ultimately even harm your business. In this article we’ll cover the basics of bot traffic, how to detect it and how to eliminate it.

A pair of robotic hands typing on a glowing,modern keyboard in a dark environment.
Image Source

Tweet this: Data-driven decisions can only be as good as the data itself

About Bots

Considering that for every human hit there’s a bot hit to a server, it is safe to say that bots are an essential component of the internet infrastructure. One one side we have “good bots” which are used by organisations to gather information and perform automated tasks (such as search engine bots that crawl and index your site), while on the other side there are “bad bots” deployed by cyber-criminals whose sole intent is to steal data or participate in botnets for launching powerful DDoS attacks (such as the Dyn one).

According to Imperva Incapsula, 22.9% of all traffic is made by good bots while 28.9% are malicious. Good bots, such as search engine crawlers, respect your robots.txt file instructions and are excluded from GA reports by default. Bad bots, on the other hand, visit your site with all kind of intentions like spamming, content scraping or malware distribution. Advanced bots often do a great job at imitating human behaviour making it very difficult to separate them from regular human visitors. Vast media coverage of security breaches has further pushed the online community to rethink their security solutions. Many of those solutions offer a high level of bot protection, filtering out most bot traffic.

A cute robotic character sitting on the floor and using a laptop.
Image Source

Being treated mainly as a security issue, bots are often an afterthought for marketers. However, even if most of the bot traffic gets eliminated, there are still residues left that your GA easily picks up and treats as legitimate visits. These residues can end up messing up your analytical data. When talking about bad bots that get picked up by GA, there are two main types:

  • Ghost bots – The ones that never actually visit your site. This are nothing more than spam nuisance and show under referral traffic in GA.
  • Zombie bots – Fully render your site. They produce analytics spam by triggering your analytics code as an after effect of their activity.

How To Detect Bot Traffic

In the past you may have heard that bots won’t show in your GA data because they don’t trigger Javascript. Unfortunately, today that’s not true. Also, the Filter Known Bots & Spiders checkbox within GA provides some degree of protection but isn’t fully efficient against ghost and zombie bots. Some early indications of bot traffic in your analytical reports can be:

  • Low average session duration
  • High bounce rates
  • No goal completion
  • High values for new visitors
Summary of web analytics data with sessions and conversion metrics.
Image Source

Tweet this: GA Bot signals: Low session duration, High bounce rates, No goal completion, Bursts of new visitors

If you check your GA dashboard regularly you could encounter a sudden burst in traffic on some occasions. If there were no special occasions, campaigns or social events that could justify the sudden increase in traffic then it is likely due to bot traffic. In that case make sure to check some of the following GA reports for those unexpected spikes in traffic:

  • New vs. Returning Users  By comparing new and repeated sessions you can easily track bot traffic as bots get picked as new users but provide low levels of website engagement. (Path: GA – Audience – Behavior – New vs Returning User)
  • Browser & OS – After tracking a specific browser that achieved a high number of sessions, you can narrow down this report to find the version responsible for the sudden burst of traffic. Cross-check it with bounce rates and session durations to verify the assumption of bot correlation. (Path: GA – Audience – Technology – Browser & OS)
  • Network Domain – The “Network” report provides a list of ISPs from which you traffic is generated, such as Google, Verizon or Amazon. By adding a secondary dimension “network domain” you can narrow down traffic by domain. Usually the most common generator of bot traffic is “amazonaws.com”. Once you detect the domain that generated bot traffic you can exclude that traffic by applying a custom filter. (Path: GA – Audience – Technology – Network)

How to Exclude Bot Traffic

Google has announced a global solution for filtering bots and referral spam in GA but until it gets released here are a few tricks you can implement by yourself. Before you start applying any filters, make sure to set up an unfiltered view – a “filter with no filters” – that will encompass all your website traffic data, including bots. Don’t skip this step, nobody wants to lose data to a typo.

As a first step, you should get to know your bot traffic. As bots can be good, bad and neutral, make sure all your internal teams are coordinated on the topic. Align marketing, IT, sales, site operations, etc. efficiently as some bots may be related to partners, used tools or extensions and thus legitimate.

Next is a recap of the procedures needed to successfully exclude bot data. For an in depth step-by-step guide, make sure to check our previous article on the topic of filtering bot traffic.

To exclude bot data, you need to label bots as bots. Next, you need to create appropriate filters in GA to apply them to your data in order to keep it as clean as possible. Start by going to the Admin section in GA, then Settings and then Create Copy. Name it – www.yourwebsite.com// Bot Exclusion View or similar. You will then use this new view to filter out bot traffic. At first it will appear empty but will build up with time.

Screenshot showing admin settings interface with options to copy view and move to trash.

After previously detecting your bot sources, you can now proceed to setting up filters that will eliminate invalid data from your reports. For ghost bots, you will need to setup a filter by Hostname and take note of all the valid hostnames. After that create a regex that will hold only those. The new regex will also capture all subdomains on the main domain. Now head back to your Bot Exclusion view to add a new custom filter. Select Include Only Hostname and add the created regex into the field.

Now, with zombie bots, it’s a bit more complicated. You’ll have to filter zombie bots by detecting their footprints first. Use the reports where you previously detected bot traffic and add those suspicious sources to a new regex. Proceed to detect other bot footprints and repeat the regex procedure. After detecting zombie bot footprints, head back to Admin section – Filters in Bot Exclusion view to apply the filters. Do the same steps as for ghost bots, but instead of Hostname, create filters to exclude each regex data. You can see a step-by-step guide for setting up filters here.

Advanced Filtering

Now the new view will filter almost all bot traffic. At some point, however,  you will probably need to verify historical traffic in your original view. This will require you to set up an Advanced Segment that replicates the filters applied earlier. You’ll need to Add a Segment in the Reporting dashboard of your original view. Name it something like “Bot Filter” and add all the new filters in the Advanced – Conditions section (keep in mind the Include/Exclude setting).  

Screenshot of the Audience Overview dashboard with the 'New Segment' button highlighted.

Tweet this GA Advanced Segment Filter – All your bot filters in one place to use anytime

By doing so you will have and Advanced Segment filter at your disposal. This will contain all your bot filters and you will be able to apply it on any report and even for any selected date range. Also there are lots of other advanced techniques that can be implemented. But if you don’t have adequate technical skills, we strongly suggest to avoid these:

  • Server-side technical changes such as .htaccess edits – requires technical knowledge and can easily take the wrong turn.
  • Referral Exclusion under Property – it has proven to be inaccurate, it often shifts the visit to a (none)/Direct visit, doesn’t provide a universal solution and doesn’t allow to check false positives with your historical data.

Conclusion

As said, bots make more than half of all Internet traffic and are a force to be reckoned. With the advance of technology, bot sophistication is rising and their impact on businesses is increasing. Until a global solution gets put in place, creating GA filters will bill your safest pick. Worth mentioning is the fact that eliminating bot traffic from your GA reports simply hides them from your data but they do still hit your servers, and may consume resources as well as impact the performance of your web assets. Stopping bot traffic altogether requires high expertise and adequate tools. Also, all the above described steps can be stressful and time-consuming so at a certain point you might want to consider contacting experts on the matter. If you seek excellence in eliminating analytics spam without risking your unfiltered data, filtering false positives or creating unsustainable server changes, our experts are always here to help. Feel free to contact our experts at GlobalDots as they can help you pick security and performance solutions that best suit your needs.

 

Latest Articles

Complying with AWS’s RI/SP Policy Update: Save More, Stress Less

Shared Reserved Instances (RIs) and Savings Plans (SPs) have been a common workaround for reducing EC2 costs, but their value has always been limited. On average, these shared pools deliver only 25% savings on On-Demand costs—far below the 60% savings achievable with automated reservation tools. For IT and DevOps teams, the trade-offs include added complexity, […]

Itay Tal Head of Cloud Services
5th December, 2024
The Future of Cybersecurity: Shlomo Kramer’s Bold Predictions for the SASE Era

What does the next decade of cybersecurity hold? Few can answer that better than Shlomo Kramer—co-founder of Check Point and Imperva, and founder & CEO of Cato Networks. In a candid conversation on the CloudNext podcast, Shlomo shared bold predictions and actionable strategies for navigating the challenges and opportunities ahead. From the rise of SASE […]

Ganesh The Awesome Senior Pre & Post-Sales Engineer at GlobalDots
4th December, 2024
Three Ways CISOs Can Combat Emerging Threats in 2025

73% of CISOs fear a material cyberattack in the next 12 months, with over three-quarters convinced AI is advancing too quickly for existing methods to combat it. But what can CISOs do to prepare for the coming wave – and access the resources they need to deal with this evolving threat landscape? To find out, […]

11th November, 2024
How Optimizing Kafka Can Save Costs of the Whole System

Kafka is no longer exclusively the domain of high-velocity Big Data use cases. Today, it is utilized on by workloads and companies of all sizes, supporting asynchronous communication between even small groups of microservices.  But this expanded usage has led to problems with cost creep that threaten many companies’ bottom lines. And due to the […]

Itay Tal Head of Cloud Services
29th September, 2024

Unlock Your Cloud Potential

Schedule a call with our experts. Discover new technology and get recommendations to improve your performance.

    GlobalDots' industry expertise proactively addressed structural inefficiencies that would have otherwise hindered our success. Their laser focus is why I would recommend them as a partner to other companies

    Marco Kaiser
    Marco Kaiser

    CTO

    Legal Services

    GlobalDots has helped us to scale up our innovative capabilities, and in significantly improving our service provided to our clients

    Antonio Ostuni
    Antonio Ostuni

    CIO

    IT Services

    It's common for 3rd parties to work with a limited number of vendors - GlobalDots and its multi-vendor approach is different. Thanks to GlobalDots vendors umbrella, the hybrid-cloud migration was exceedingly smooth

    Motti Shpirer
    Motti Shpirer

    VP of Infrastructure & Technology

    Advertising Services