AWS Network Performance Management Problems and Solutions

Ganesh The Awesome Senior Pre & Post-Sales Engineer at GlobalDots

31st May, 2019 5 Min read

Cloud based online services and applications face the challenge of optimizing end user experience in the face of an ever growing bandwidth demand. This is further complicated by network related aspects of application performance like latency, congestion and jitter which are inherent to the architecture of the internet.

These issues can have a very negative impact on end user experience and can vary widely depending on the customer’s geographical location or the ISP/network that they use. End users on the US East Coast may have a great experience, while those in Europe see long latencies and very slow performance of your Web shop or application. Similarly, end users at a specific ISP may experience a very responsive service, whereas their neighbor down the street is confronted with a very sluggish web site with long page loading times due to slow delivery of ads in the pages.

How One AI-Driven Media Platform Cut EBS Costs for AWS ASGs by 48%

In this article we discuss AWS network performance management problems and solutions.

Key problems in AWS network management

1. Business problem

Network performance is essentially a black box for online service providers where they have little to no visibility into performance metrics like latency and congestion. As of today (Net) DevOps personnel have to manually diagnose network performance issues and redirect network traffic to avoid these problems. This is not an exact science and is mostly reactive in nature. Also putting in place the hardware capabilities to optimize the network related aspects of cloud application performance is costly and complex. The absence of any cost effective and automated tools to improve these performance metrics, in any meaningful way, adds to the problems.

The business impact however is very real:

Delayed Ad serving, bidding for Adtech
Lower conversion rates in eCommerce
Customer support issues
Lower return rates
Increased churn rates
Higher bandwidth costs
Business/Service competitiveness
Losing out on potential sales
End user quality of service suffers

Network providers have a vested interest in BGP route selection. Not all routes through the internet cost the same. BGP route selections are often influenced by business interests of network providers and their wish to control next hop selection.

ISPs often choose to route traffic though network paths that have the most financial benefit for them and not based on network performance metrics. There have been documented cases of large ISPs intentionally creating congestion in some network nodes to charge service provider’s premium rates for non-congested paths. Basically what they do is create a lesser version of the internet to be able to charge bigger bucks for the internet as usual.

2. Technical problem

The internet is a huge mesh of complex interconnected networks. It utilizes two groups of routing protocols to determine the path of traffic through the various networks. Interior Gateway Protocols (IGPs) for intradomain routing, and BGP for interdomain routing between Autonomous System (AS) organizations. One way to understand network performance issues is to look at the way in which internet traffic
is routed by the BGP.

BGP serves as the standardized routing protocol of the internet. It was designed in the early days of the internet with a focus on network reachability and stability, however it is not very smart when it comes to routing traffic to optimize performance related metrics like latency, congestion and packet loss. In addition, it has become very hard to analyze, manage and troubleshoot with the explosive growth of the
internet.

BGP works by exchanging routing and reachability information between autonomous systems on the internet. BGP makes routing decisions based on a number of metrics including reachability and AS_PATH attribute. This basically translates into choosing network routes which are reachable and have the lowest number of AS hops. BGP does not have the capacity to evaluate different network routes based on their
performance metrics like latency, packet loss, congestion and packet loss. Therefore, these crucial performance related metrics are left out when making routing decisions. As a result, network traffic often suffers from high latency, congestion and packet loss.

GlobalDots’ solution

GlobalDots probes upwards of 600,000 network prefixes in real time and collects performance data of every path through the network. This data is then processed and analyzed to determine the best path through the network with the lowest latency, congestion, bandwidth cost and packet loss. (Net) DevOps have the ability to create rules to automate network traffic routing and in the process minimize network
performance issues.

The Cloud/AWS deployment places the GlobalDots appliance between the virtual infrastructure and the transit providers. The connection to the virtual infrastructure is a physical connect (i.e. AWS DirectConnect 1Gbase-LX or 10Gbase-LR). Each customer is connected to the GlobalDots premises using a unique VLAN identifier.

Within the GlobalDots premises, there exists a virtual routing table for each application that reflects the specific performance requirements of the customers. GlobalDots collects RIB (Routing Information Base) data from routeviews.org. RIB represents a special type of database which stores routing information received by every BGP speaker from other peers.

Diagram illustrating the architecture of a system including components like AWS Direct Connect

Next GlobalDots probes all prefixes in the RIB for specific metrics like latency, congestion, packet loss and bandwidth cost. All this performance data is processed and analyzed by a spark cluster and an optimized routing policy is generated for specific metrics based on customer requirements that have been indicated. The optimized routing policy is generated by matching the performance attributes of each destination network (prefix) in the network to the customers’ requirements.

Once the prefixes match the performance requirements of the customers they can opt to override the best path selection of the Border Gateway Protocol. Detailed analytic reports are generated and communicated to the customer through the front end dashboard and provide end to end visibility into network performance.

Benefits include:

Network route optimization
Latency optimization
Bandwidth cost optimization
Avoid network congestion
Reduce packet loss
Real-time network analysis

Conclusion

Managing AWS network performance has specific problems you need to solve if you want to get the most out of it. If you have any questions about how we can help you optimize your cloud costs and performance, contact us today to help you out with your performance and security needs.

Latest Articles

Cloud Cost Optimization FinOps

Cloud Cost Optimization: The Definitive Guide to Reduce Cloud Spend

The move to the cloud has enabled tech leaders to modernize their infrastructure and improve application availability, scalability, and performance.

Ganesh The Awesome

31st March, 2024

Cloud Cost Management: How to Understand and Reduce Cloud Costs

Cloud cost management is one of the major pain points various organizations have when migrating to the cloud. Cloud costs can sometimes be difficult to estimate, due to the complexity of the cloud inf

Ganesh The Awesome

6th November, 2018

Web Security

Solving Network Security Issuesfor Rapidly Growing Global Businesses

Introduction Ryohin Keikaku Co., Ltd. (hereinafter referred to as “Ryohin Keikaku”) is committed to a locally integrated business model focused on individual store management and deep community engagement. To keep pace with its rapid expansion—adding 100 new stores annually in Japan—and its growing global presence, now spanning 225 locations across 20 countries, including 50 stores […]

3rd April, 2025

Cloud Security

Rethinking IT Security to Build Resilience for the Modern Threat Landscape

The recent two decades have changed how applications are built, delivered, and used. We used to have isolated networks with predictable entry points, but today, that has been replaced with a dynamic, interconnected web of APIs. The consequence of this is the dissolution of the traditional security perimeter. Today, protecting a single network boundary doesn’t […]

1st April, 2025

Back to Resources

Key problems in AWS network management

GlobalDots’ solution

Conclusion

Unlock Your Cloud Potential