Site Reliability Engineer
Location: Dearborn
Posted on: June 23, 2025
|
|
Job Description:
Job Title: Site Reliability Engineer Job Location: Detroit
Michigan 48221 Onsite Requirements: Remote Worked 2-3 years as an
SRE in Azure Used terraform, github and ansible for automating SRE
job functions not in DevOps role Willingness to lead incident
management for issues in the Azure cloud as that is a key
responsibility of an SRE Job Description: The Cloud Site
Reliability Engineer (SRE) works closely with cloud development
team, IT operations team and business partners to streamline and
implement enhanced monitoring and alerting capability across
infrastructure, application layers. By leveraging automation tools,
SREs address and resolve issues, minimizing manual workload and
enhancing system scalability and reliability. Their core focus lies
in standardization and automation to build and run fault-tolerant
systems. Typically, SREs possess a background in software
engineering, system engineering, or system administration, coupled
with substantial IT operations experience. SREs oversee
availability, latency, performance, efficiency, change management,
monitoring, emergency response, and capacity planning. Key
Accountabilities: Writing and developing code to automate
processes, such as analyzing logs, testing production environments
and responding to any issues Collaborates with agile teams and
business partners to develop specifications that resolve problems
and enhancement needs, including focusing on monitoring, and
metrics for operational readiness Identify bottlenecks in
development and deployment processes and designs automation
solutions to mitigate Develop new capabilities in
displaying/monitoring/alerting on key performance indicators by
tracking business transactions in real-time Maintain and grow
knowledge of platform configuration management, monitoring of
established metrics, and troubleshooting Provides continuous
feedback to development teams on system stability, defect analysis,
and system enhancements Design and develop alert escalation and
incident response automation Provide production support for cloud
service outages and incidents and work on both tactical and
strategic plans for outage prevention Provide feedback on
resiliency and maintainability of solutions to Cloud and App
architects Conduct disaster recovery scenario generation and
testing Implement sustainable, audit-ready processes that support
information technology controls, including deployment execution,
access management, audits, incident management and related
requirements Must-have technical skills: Should have at least 3
years' experience as a site reliability engineer on a cross
functional agile team working in Azure Have working knowledge of
agile development methodologies (scrum, sprints, KanBan etc.) and
tools (Azure DevOps etc.) Have at least 3 years hands-on experience
using IaC tools Terraform, Github, Ansible and Packer Proven
experience across testing, integration, source code management,
deployment and containerization Sound problem-solving skills with
the ability to quickly process complex information and present it
clearly and simply Experience with cloud technologies and services
including those for Compute, Storage, Databases and API Management
On-premise to cloud migration experience Required non-technical
soft skills: Strong communication skills and ability to manage
complex technical decisions Be a team player and coach, share
knowledge and work towards building a trusted, passionate team Be a
thinker and not an order taker. Have the courage and ability to
think, understand, question before doing Have the courage to push
back and say 'NO' if that is the right thing to do for Client Have
a continuous improvement mindset and be open to constantly finding
better ways of solving security issues 3rd party and subcontract
staffing agencies are not eligible for partnership on this
position. 3rd party subcontractors need not apply. This position
requires candidates to be eligible to work in the United States,
directly for an employer, without sponsorship now or anytime in the
future
Keywords: , East Lake , Site Reliability Engineer, IT / Software / Systems , Dearborn, Florida