Why is SRE Becoming 2021's Hottest Hire?

Ganesh The Awesome Senior Pre & Post-Sales Engineer at GlobalDots

3rd June, 2021 4 Min read

In the current IT market, one of the hottest job roles is the Site Reliability Engineer (SRE). In January 2019, according to LinkedIn, being an SRE is the second most promising job in the USA. These Statistics were cited:

Median Base Salary: $200,000
Job Openings (YoY Growth): 1,400+ (72%)
Career Advancement Score (out of 10): 9

In this post we will have a look at what an SRE does in their daily work, a little history on Site Reliability Engineering, and what the foundations are; and how you can become an SRE.

How One AI-Driven Media Platform Cut EBS Costs for AWS ASGs by 48%

What Does an SRE Do?

DevOps and Site Reliability Engineering are different disciplines, but they are not competitors. They complement each other. That blog post explained the differences between Site Reliability Engineering and DevOps. Here we will strictly focus on characteristics of the SRE role.

Site Reliability Engineering is the application of software engineering to operational problems. The word ‘Reliability’ means an SRE has a particular role in an organisation and the Software Development Life Cycle. SREs teach application developers how to build reliable services. Next to that, they ensure that the computer systems of an organisation run correctly, 24/7. Security, stability and scalability are very important here. The business wants reliable services.

Site reliability engineers create a bridge between development and operations by applying a software engineering mindset to system administration topics.

An SRE is, therefore, a vital role within an organization. Typical SRE activities include:

Develop and manage scalable, secure and stable systems
Conduct Incident analysis
Analyze performance and create improvement plans
Monitor efficiency systems
Manage risks
Automate manual tasks within the SDLC
Build automated service tools, logs and test environments to ease the engineers’ workload
Implement new features
Select infrastructure tools
Adapt environments to increasing or decreasing numbers of users

Have a look at “The Ultimate Guide to SRE Acronyms” if you want to learn how to “talk SRE.”

A Little History About SREs

The term ‘Site Reliability Engineer’ originated at Google by Ben Treynor Sloss, VP of engineering, in 2003. He was hired by Google to manage a team of software developers running a production environment. Continuous development, integration and operations demanded a new way of thinking. That’s how Site Reliability Engineering came to be.

Ben Treynor Sloss explained the core of the SRE role in this interview:

“SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software expertise and banking on the fact that these engineers are inherently both predisposed to, and have the ability to, substitute automation for human labor. In general, an SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.”

Now that we know the origin of SRE, we can askt, what is this role built on?

What are the Foundations of SRE?

Site Reliability Engineering is based on the following:

Scalability – System can handle a growing amount of work by adding resources to the system
Availability – System works as required
Incident Response – Managing the handling of incidents happening with the system
Automation – Automating the Software Development LifeCycle Workflow

These fundamental elements are embedded in the job of an SRE in a balanced and efficient manner, to deal with the daily work in the organisation. To do this, an SRE needs a toolbox.

What is in a typical SRE Toolbox?

A Site Reliability Engineer works with the following software, languages, and tools:

Software languages: Ruby, Python, C++, Bash, Java
JavaScript extensions: Node JS, React, TypeScript
Cloud computing Services: AWS, Azure
Infrastructure tooling: Terraform, Cloud Formation, Ansible
Container tooling: Kubernetes, Docker, Meso

As you can see, an SRE must have Development and Operations skills to automate the manual skills of a development team.

How to Become an SRE

Currently, SREs are high in demand. But it is not an easy job. As stated earlier, an SRE needs development and operations skills – a Pi-shaped skill set. For this skill set, an SRE has to be proficient in both trades; not just one or the other, which defines a T-shaped skill set. This makes SRE a very demanding and practical career. It can be beneficial to have a solid understanding and knowledge base to start from, check out the Top 10 SRE Books to Read in 2021. However, itt can also be learned on the job with the right motivation and endurance. Most SREs have a software development or system and networking engineering background or education.

At Google, SREs do at least 50% development during their daily job. An SRE is still a software developer; an engineer doing operations.

Do you want to become an SRE? Big tech companies, Google included, want you because they know SREs are very hard to find. Is this because a good SRE ultimately ‘automates their way out of a job’?

We hope that this article showed you what a Site Reliability Engineer does, why it is in high demand and how you can become one. For more information, you can take a look at Google’s take on SRE as well as this excellent series of videos that they posted on YouTube.

To learn more of the SRE toolkit, visit our solution page and grab the solution brief.

Originally posted by StackPulse.

Latest Articles

Reliability Platforms (SRE)

SRE Terminology: The Definitive Guide

If your work relates to site reliability engineering, incident response or even just plain-old DevOps, it’s easy to feel like you are drowning in a sea of acronyms. The IT world, in general, is riddled with acronyms (Wikipedia lists hundreds of them) that can be hard for the uninitiated to decipher; but the world of […]

Francesco Altomare Technical Sales Lead for Southern Europe, GlobalDots

6th April, 2021

Reliability Platforms (SRE)

The ROI of Playbooks-as-Code

The only way businesses can guarantee service reliability at scale is to develop playbooks-as-code as mechanisms to make your systems more robust.

Miguel Fersen Director for Iberia and LATAM, GlobalDots

1st March, 2021

Cloud Computing

13 Key Cloud Computing Benefits for Your Business

Cloud-based enterprise solutions offer world-class technology for affordable price. Cloud can reduce your costs, ensure security of your data, increase productivity and much more. Read more to find ou

Ganesh The Awesome Senior Pre & Post-Sales Engineer at GlobalDots

5th July, 2023

DevOps as a Service

DevOps, DevSecOps, and SRE: What’s (Really) the Difference?

DevOps is an enterprise software development phrase used to describe an agile relationship between development and IT management. Changing and improving the relationship between these two business units is the goal of DevOps, which advocates better communication and collaboration. It is estimated that the DevOps (Development to Operations) market will grow at a CAGR during […]

Ganesh The Awesome Senior Pre & Post-Sales Engineer at GlobalDots

9th May, 2022

Back to Resources

Why is SRE Becoming 2021’s Hottest Hire?

What Does an SRE Do?

A Little History About SREs

What are the Foundations of SRE?

What is in a typical SRE Toolbox?

How to Become an SRE

Latest Articles

Unlock Your Cloud Potential