As CDNs become ubiquitous, our need to monitor and understand the operational performance of our solution becomes increasingly more important. Logs are a brilliant way to get insight into the health and performance of your CDN.
Logs are often held up as a shining example of observability data, comprising an unstructured collection of quantitative and qualitative indicators as to the health of your system, like latency, request size, request volume and so on. However, there is very little information out there on the techniques and tools you can use to get the most out of these logs.
How One AI-Driven Media Platform Cut EBS Costs for AWS ASGs by 48%
Let’s take a look at some of the most common values you can use to create operational insights, and some of the tools you may wish to use, in order to visualise your data in the most powerful way.
Metrics of your most popular endpoints
Your CDN logs contain entries for every single request that is issued to your site, meaning you can aggregate this data and render it out in a table, to show which pages are receiving the most traffic. You can count your logs based on the path values that appear in the web log. A typical CDN log is structured like this:
<code>
127.0.0.1 username [10/Oct/2021:13:55:36 +0000] “GET /homepage HTTP/2.0” 200 150 1289
</code>
You can see that this log line was a request for the homepage of your site. You can even see, in this example, the user that issued the request. One option here is to simply aggregate your logs by counting each unique value for the page visits.
When might you render this data in a table?
You can easily see which pages on your site are seeing the most traffic. If, for example, you run a site that profits from ad revenue, you know that ads placed on these sites will see the most traffic, opening up new lines of revenue.
A table offers a clear picture of which products on your site are the most in demand. Knowing which products are being viewed the most can give you insight into the behaviour of your customers and allow you to optimise your conversion strategy.
And what are the security benefits of a line graph?
Understanding your most commonly used pages under normal conditions gives you a clear baseline of typical usage. Any sudden change in this typical usage should be a cause for investigation. It may be a perfectly normal event, but sudden shifts in the typical load profile to your site may indicate somebody using your site in a way you didn’t intend, for example to scrape your content.
Visualise your true latency using percentiles
When people graph how long their site is taking, most people simply aggregate their measurements into an average value. This is okay, because it gives a high level overview of how your site is performing. However, the problem with averages is that they hide variance.
Variance is an important metric to track in your CDN, because, while most of your services may be responding quickly, you may find that there are slow parts of your system. These slow parts will be hidden by the overwhelming majority of fast responses. This means you’ll never gain insight into the true latency for your system.
So how do you track the variance in your CDN log latency?
Percentiles. Percentiles give you an indication of the outliers in your data. For example, the median is the 50th percentile, meaning that it’s the latency that 50% of your responses managed to respond at. The 99th percentile is the time it took for 99% of your responses to be returned to your users. The more you increase the percentile, generally speaking, the higher the latency.
This means you can capture the extreme aspects of your CDN latency, while also graphing things like the average as a reference point. A very common option is to graph the following:
Series | Description | Why it’s useful |
Average | Also known as the arithmetic mean, this is taken by summing all of your values in a given timeframe and dividing by the number of values in the set. | The average gives you a very good high level view, meaning you get a view of how your entire system is behaving, most of the time. |
50th Percentile | Also known as the median, in a normal distribution. This is great for | The median is used for the same reason as the mean, but it is better suited if you have a high variance in your data. Where the average might be skewed by outlier data, the median is much more resilient. |
95th Percentile | The 95th percentile is 1 standard deviation away from the median, in a normal distribution. Following our example of latencies, this value is the time it took for 95% of requests to your system to respond. | Where the median and the average aim to be unaffected by outlier information, the 95th percentile visualises it specifically. This means that if you’re visualizing either the average of the 50th percentile, the 95th percentile is completely indispensable to get a true view of your latency. |
99th Percentile | The 99th percentile is 2 standard deviations from the median, in a normal distribution. This is an extreme version of the 95th percentile, and indicates the time it took for 99% of your requests to respond. | In some systems that need to be very fast, visualising that final 1% is useful for maintaining fine tuned performance. The drawback of this option is that it’s incredibly sensitive to any change in the data, and so is left off these charts. |
And what is the best way to visualise your CDN log latency?
Statistically speaking, what we’re really interested in here is the change in latency. If latency suddenly spikes at the 99th percentile, it can give us an indication something has gone wrong. This means we need some time-series representation of the data.
A simple line graph is more than sufficient to render this information out for us. You may wish to spend some time tuning the scale to ensure that your average isn’t hidden by your percentile measurements.
Why else might you wish to track your latency this way?
Some pages are slow because it requires a lot of processing to load them. Slow pages are natural, but very slow pages that consume a lot of backend resources are a potential attack vector for a DDoS. If your 95th percentiles are always way higher than the rest of your measures, you know that you’ve got a few pages, services or tools that are introducing variance into your traffic.
Capture the devices that are driving traffic with the User-Agent header
One of the headers that regularly appears in the CDN logs is the User-Agent header. This header gives an indication of the device that has sent the request. Understanding which devices are driving your traffic is crucial for a number of reasons.
Primarily, if you know which devices your customers are using to engage with your site, you’re able to make product decisions about which browsers, devices and versions to optimise for. In 2020, mobile devices drove 61% of traffic in the US, so a mobile first approach to site building seems safe, but your product may deviate from this. You may find that you have a lot of iPhone customers and not many using Android, or that Safari is the browser of choice for your mobile customers. The User-Agent header will set this for you.
Secondly, users now operate cross-device. This means that they navigate to a site on their phone, then continue their session on their laptop, or vice versa. Understanding how and why users are doing this is going to give you new insights into how you can make your site experience as fluid as possible.
Hackers often avoid the user agent field
The user agent field is optional. When nefarious scraping tools are activated, they almost never set the User-Agent field, meaning that if your analysis indicates that a sudden, unknown device has begun to consume a lot of traffic, it is worth investigating. This means that the User-Agent field has a lot of potential to let you know when some unexpected or malicious traffic has arrived at your site.
Choose your visualisation carefully
The most common way that people visualise this information is using a pie chart. Pie charts are great because they give a really clear indication of proportionality.
However, this doesn’t necessarily give you any indication of the trend of the data. What you need is a visualisation that advertises both the trend of the data and the proportionality. There is, however, a straightforward way to do this, using a stacked area chart.
Stacked area charts look visually similar to line graphs, but rather than allowing each series to overlap over one another, it stacks them each on top of each other. This enables you to easily see the proportional relationship between each series of data.
Conclusion
We’ve been through some simple visualizations that you can employ to capitalise on your data and develop a great understanding of what your CDN is telling you. We’ve also discussed some sensible rules for visualizations, and touched on examples like a Pie Chart or a table and when they’re most appropriate.
The key to effective visualization is to understand specifically what you want to do with the data. What is the mission? Is it simply to get an “in moment” view of the current state of your system, or do you wish to detect change and track performance over time? With these motives, and a powerful observability platform, you’ll be able to open up your CDN and learn all about the true state of your system.