VIEW
SAVED
JOBS

AWS Cloud DevOps Incident Manager

2024-144951
Systems Engineering / Development / Architecture / Integration
Public Trust

Location:

,

Secondary Location:

,
,

Education:

Telecommute Options:

Remote work allowed 100%
Join Our Team
Apply now
right arrow
Share on your newsfeed or with a friend
About Peraton

Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world’s leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees do the can’t be done by solving the most daunting challenges facing our customers. Visit peraton.com to learn how we’re keeping people around the world safe and secure.

Responsibilities

We are looking for an AWS Cloud DevOps Incident Manager. This hired individual plays a critical role in ensuring the reliability and availability of our software systems and services by effectively managing and responding to incidents. This individual will lead the incident response process, coordinating cross-functional teams, implementing incident management best practices, and driving continuous improvements to minimize future incidents.

 

What you will do: 

  • Lead and coordinate the end-to-end incident management process, from detection and diagnosis to resolution and post-incident analysis.
  • Establish and enforce incident response procedures, ensuring that teams follow established protocols to minimize downtime and impact on users.
  • Collaborate with development, operations, and support teams to ensure a unified and coordinated response to incidents.
  • Monitor system health and performance metrics to proactively identify potential incidents and address them before they escalate.
  • Act as the point of contact during high-severity incidents, keeping stakeholders informed and managing communication to internal and external parties.
  • Conduct post-incident reviews to identify root causes, contributing factors, and areas for improvement. Implement corrective actions to prevent similar incidents from occurring.
  • Drive continuous improvement by analyzing incident trends, identifying recurring issues, and working with teams to implement solutions.
  • Develop and maintain documentation related to incident response procedures, including runbooks, escalation paths, and communication guidelines.
  • Create dashboards and reports to provide insights into operational performance and health.
  • Provide mentoring and guidance to team members to enhance incident response skills and overall operational excellence.
  • Collaborate with engineering teams to ensure that incident learnings are integrated into the software development lifecycle to improve overall system resilience.
  • Stay up-to-date with industry best practices, emerging technologies, and trends related to incident management and reliability engineering.

 

 

Qualifications

 

Required Qualifications:

  • Minimum of 8 years with BS/BA; Minimum of 6 years with MS/MA; Minimum of 3 years with PhD. Additional years of experience maybe accepted in lieu of the degree.
  • Proven experience in incident management or a related role within a DevOps or SRE (Site Reliability Engineering) environment.
  • Strong understanding of software development, infrastructure, and AWS cloud technologies.
  • Familiarity with incident management tools and systems, such as incident tracking software and monitoring platforms.
  • Excellent problem-solving and critical-thinking skills, with the ability to handle high-pressure situations calmly and methodically.
  • Excellent communication and interpersonal skills, including the ability to lead cross-functional teams and communicate effectively with technical and non-technical stakeholders.
  • A strong ability to learn new technologies combined with a strong ability to coordinate activities in an interrelated and highly visible manner.
  • Must be able to multi-task and work well with changing priorities in a fast paced, 24x7 environment.
  • Experience with continuous integration and continuous delivery (CI/CD) pipelines is a plus.
  • Relevant certifications in incident management, DevOps, or related areas are desirable.
  • Must be a US CItizen
  • Ability to obtain  and maintain a High Risk Public Trust 6C is required.

Preferred Qualifications:

  • High Risk Public Trust or Secret Clearance preferred.

Benefits:
 
At Peraton, our benefits are designed to help keep you at your best beyond the work you do with us daily. We’re fully committed to the growth of our employees. From fully comprehensive medical plans to tuition reimbursement, tuition assistance, and fertility treatment, we are there to support you all the way.

Target Salary Range

$86,000 - $138,000. This represents the typical salary range for this position based on experience and other factors.
SCA / Union / Intern Rate or Range

EEO

An Equal Opportunity Employer including Disability/Veteran.

Our Values

Benefits

At Peraton, our benefits are designed to help keep you at your best beyond the work you do with us daily. We’re fully committed to the growth of our employees. From fully comprehensive medical plans to tuition reimbursement, tuition assistance, and fertility treatment, we are there to support you all the way.

  • Paid Time-Off and Holidays
  • Retirement
  • Life & Disability Insurance
  • Career Development
  • Tuition Assistance and Student Loan Financing
  • Paid Parental Leave
  • Additional Benefits
  • Medical, Dental, & Vision Care
Happy man with his two children