Site Reliability Engineer

iCIMS
Holmdel, NJ

Job Summary

We are seeking a skilled Engineer, Site Reliability (SRE) to contribute to the reliability, scalability, and performance of our multi-cloud SaaS platform serving thousands of customers worldwide. This role involves hands-on technical work in incident response, system monitoring, automation, and continuous improvement of our platform reliability. The successful candidate will work within a global SRE team to ensure optimal system performance and customer satisfaction.

Responsibilities

  • System Monitoring & Reliability:
    • Monitor multi-cloud infrastructure (AWS, Azure, GCP) using New Relic, Grafana, and Sumo Logic
    • Maintain reliability of AWS resources, Auth0/Okta authentication, databases, and legacy applications
    • Implement monitoring, alerting, and dashboards for assigned systems
  • Incident Management & Response:
    • Respond to alerts and incidents within SLA timeframes
    • Perform root cause analysis and document findings
    • Create and maintain runbooks and troubleshooting procedures
    • Participate in 24/7 on-call rotation
  • Automation & Improvement:
    • Develop scripts to reduce manual operational overhead
    • Build monitoring and alerting solutions
    • Support infrastructure-as-code initiatives
    • Implement automated remediation where possible
  • Success Metrics:
    • Customer Impact : Reduced MTTR and improved customer satisfaction scores
    • Reliability : Achievement of 99.9%+ uptime SLAs across all products and regions
    • Proactive Prevention: Reduction in incident frequency through automated detection and prevention
    • Cross-functional Collaboration: Improved partnership metrics with Product, Engineering, and Customer Success teams
    • Automation Delivery: Complete assigned automation projects to reduce manual tasks
    • Knowledge Sharing: Contribute to team knowledge base and mentor junior engineers

Qualifications

  • 4+ years experience in SRE, DevOps, or Infrastructure Engineering
  • Hands-on experience with AWS (required) and Azure (preferred)
  • Strong Linux system administration skills
  • Experience with monitoring tools (New Relic, Grafana, Prometheus)
  • Scripting skills in Python, Bash, or similar
  • Knowledge of databases (SQL Server, PostgreSQL, MongoDB)
Posted 2025-10-21

Recommended Jobs

Production Technician I - MPX NJ - 2nd Shift

iAnthus Capital
Pleasantville, NJ

2nd Shift: Monday - Friday (2:30pm - 11pm)   Who We Are: iAnthus Capital Management is a multi-state operator, encompassing the full spectrum of cannabis enterprises, from cultivation to proces…

View Details
Posted 2025-07-28

Customer Service Representative

Robert Half
Parsippany, NJ

Job Description Job Description We are looking for an experienced Customer Service Representative to join our team on a long-term contract basis in Parsippany, New Jersey. In this role, you will …

View Details
Posted 2025-10-23

Casual Dinning Server

TGI Friday's
Newark, NJ

No one has a bigger impact on the guest’s than you. You know the menu and bring fun to every table! Wage Minimum ($3.63 per hour cash wage, plus $9.62 tip credit) +Tips Wage Max ($25 estimated per ho…

View Details
Posted 2025-07-26

Assistant Manager

Foot Locker
Cherry Hill, NJ

Overview: You can’t think of anywhere else you’d rather be. You enjoy coaching and teaching your team to continually improve how they deliver a great in-store Customer Experience, and you’re now read…

View Details
Posted 2025-10-24

Clinical Services: Clinical Project Associate

Canfield Scientific, Inc.
Parsippany, NJ

An ideal candidate for our Clinical Project Associate position is an individual who can prioritize and multitask, is experienced in windows-based computer applications, and possesses strong communicat…

View Details
Posted 2025-08-06

HCP Engagements - Managed Services - Health PLS, Senior Associate Save for Later Remove job

PwC
Florham Park, NJ

At PwC, our people in risk and compliance focus on maintaining regulatory compliance and managing risks for clients, providing advice, and solutions. They help organisations navigate complex regula…

View Details
Posted 2025-10-30