Site Reliability and Operations Engineer
Site Reliability and Operations Engineer
04/29/2025
Contract
Active
Job Description:
Job Summary:
We are seeking a highly skilled Site Reliability and Operations Engineer (SRE) with a robust background in Kubernetes-based distributed caching and compute grid systems. The ideal candidate will possess a solid blend of infrastructure engineering and software development skills. This role will focus on the design, implementation, and maintenance of high-performance distributed platforms to ensure high availability, scalability, and system observability.
Job Responsibilities:
Development & Implementation:
Design, build, and enhance distributed caching and compute grid solutions on Kubernetes/OpenShift platforms.
Leverage technologies such as IBM Spectrum Symphony, Tibco Grid Server, or similar for high-throughput compute grids.
Utilize containerization tools (Docker, Helm) to orchestrate microservices and container workloads.
Apply parallel compute strategies and optimize load balancing for application performance.
Site Reliability Engineering (SRE):
Ensure platform reliability, scalability, and minimal downtime by maintaining robust distributed systems.
Implement and maintain observability and monitoring using Prometheus, Grafana, ELK, or OpenTelemetry.
Automate infrastructure provisioning and deployments using Ansible, Helm Charts, and similar tools.
Troubleshoot complex system and infrastructure issues in Kubernetes environments.
Support CI/CD processes using tools like Jenkins, ArgoCD, and GitHub Actions.
Required Skills & Qualifications:
- Strong experience with Kubernetes, including OpenShift, across both on-prem and cloud environments.
- Proficiency in at least one programming language: Java, Go, or Python.
- In-depth knowledge of containerization technologies such as Docker and Helm.
- Hands-on experience with CI/CD tools and pipeline integration.
- Expertise in observability and monitoring using Prometheus, Grafana, Loki, Jaeger.
- Knowledge of service meshes like Istio or Linkerd.
- Experience in multi-cluster and hybrid cloud Kubernetes deployments.
- Solid understanding of networking, security practices, and performance optimization in distributed systems.
- Experience with high-performance computing platforms or grid computing frameworks.
- Familiarity with distributed caching strategies and data sharding.
- Strong communication and documentation skills.
- Relevant certifications (e.g., CKAD, CKA, Red Hat Certified Specialist in OpenShift).
Cell phone * This field is required Please enter valid cell phone.
First Name * This field is required Please enter valid first name.
Last Name * This field is required Please enter valid last name.
#J-18808-LjbffrRecommended Jobs
Advanced Practice Nurse
Job Title: Advanced Practice Nurse Location: Rutgers University Medical Grp Department Name: RWJMS Neonatology Req #: 0000205348 Status: Hourly Shift: Day/Night Pay Range: $57.69 - $…
Acrylic Bath Installer
Acrylic Bath Installer If you have installed bathrooms for any large chain or big box store and are looking for a new opportunity, please send a confidential resume. We are one of the fastest g…
Associate Attorney
Now Hiring: Remote Associate Attorney | New Jersey | No Billables | No Sales New Gig Solutions is proud to partner with a long-established, mission-driven law firm that has been advocating for ind…
Environmental Worker
Environmental Worker JOB-10045238 Anticipated Start Date January 19, 2026 Location Newark, DE Type of Employment Contract Hire Employer Info Our client is one …
Strategic Pharma Account Director - Remote
A leading marketing firm in New Jersey is seeking an experienced Account Director to build and maintain client relationships and drive account growth. Ideal candidates will possess over 8 years of ph…
Risk Underwriting Manager
Risk Underwriting Manager Location Hybrid remote in Princeton, NJ : WHO WE ARE A modern and agile company with the most finely meshed international network, Coface is a reference in credit insurance…
Maintenance Millwright
Directly reports to the General Manager. Functionally responsible for maintaining all manufacturing equipment and components at Almag Aluminium. As a member of the Maintenance team, openly shares kno…
Sales Manager/Director
EMPLOYMENT OPPORTUNITY WITH MARCHESINI GROUP USA Seeking an Experienced SALES Manager/Director to generate and handle sales for our Beauty Division line of packaging machinery. Who we are… Mar…
Case Manager
MHA promotes mental health and total wellness for individuals facing challenges associated with mental illness and addiction recovery. We increase community awareness while enhancing mental well-bein…
Java Engineer
Title: Java Engineer Location: REMOTE Work Duration: (6-12) Months Contract Requirements Required Skills: ~3+ years of experience as a Java Engineer ~3+ years of experience in AWS …