Solutions Engineer - AI/HPC Infrastructure
Solutions Engineer - AI/HPC Infrastructure
Location: US - East or Central Time zone preferred
remote - WFH with Travel to customers
#LI-Remote
DriveNets is a leader in disaggregated high-scale networking solutions for service providers and AI infrastructures. Founded in December 2015, DriveNets created a radical new way to build networks by adapting the architectural model of the cloud to telco-grade networking. This solution accelerates network deployment, improves the network’s economic model, and radically simplifies network operations. With customers including Comcast, Orange, and KDDI - over 80% of AT&T’s network traffic now runs through a disaggregated core powered by DriveNets software. DriveNets Network Cloud-AI solution, based on the same technology, was introduced to the market in 2023, providing the highest-performance Ethernet-based AI networking solution, and is already deployed by Hyperscalers, NeoClouds and Enterprises. Raising over $587 million in three funding rounds, DriveNets continues to deploy the most innovative network infrastructure and is looking for the most talented people to be part of this journey.
The Role
As a Solution Engineer, you will play a pivotal role in designing, deploying, and optimizing Drivenets’ Network Cloud AI Infrastructure solutions. This individual contributor role requires a blend of technical expertise, leadership, and hands-on experience to implement cutting-edge solutions for our customers. You will collaborate with sales engineering teams, customers, and cross-functional teams - including Product Management, Solution Architects, Engineering, and Marketing - to define technical requirements, articulate solution value, and ensure successful deployment on-site.
Key responsibilities include guiding customers through the design and deployment process, aligning technical solutions with business needs, and providing critical feedback to improve Drivenets’ product offerings. This position demands strong technical acumen, exceptional communication skills, and the ability to lead complex, high-impact projects in dynamic environments.
Responsibilities :
- Building robust AI/HPC infrastructure for new and existing customers.
- Technical hands-on role in building and supporting NVIDIA/AMD based platforms.
- Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting.
- Administer Linux systems, ranging from powerful GPU-enabled servers to general-purpose compute systems.
- Design and plan rack layouts and network topologies to support customer requirements.
- Design and evaluate automation scripts for network operations, configuring server and switch fabrics.
- Perform NCCL, RCCL, LLM, and RDMA performance benchmarks as part of the design and evaluation process of the deployment.
- Benchmark the latest GPU compute and NIC solutions by all major compute vendors, over the DriveNets networking fabric
- Install and configure Drivenets products, ensuring optimal performance and customer satisfaction.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement.
- Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.
- Introduce new products to the Drivenets’ sales and support teams and to Drivenets’ customers
- Deliver technical trainings and TOIs for support/sales engineers, partners, and customers
- Collaborate on product definition through customer requirement gathering and roadmap planning
REQUIREMENTS
What we need to see:
- 5+ years of previous experience deploying and administering AI/HPC clusters or general-purpose compute systems.
- 5+ years of hands-on Linux experience (e.g., RHEL, CentOS, Ubuntu) and production infrastructure support (e.g., networking, storage, monitoring, compute, installation, configuration, maintenance, upgrade, retirement)
- Proficiency in Cloud, Virtualization, and Container technologies.
- Deep understanding of operating systems, computer networks, and high-performance applications
- Hands-on experience with Bash, Python, and configuration management tools (e.g., Ansible).
- Established record of leading technical initiatives and delivering results.
- Ability to write extensive technical content (white papers, technical briefs, test reports, etc.) for external audiences with a balance of technical accuracy, strategy, and clear messaging
- Ability to travel domestic and international
Ways to stand out from the crowd:
- Familiarity with AI-relevant data center infrastructure and networking technologies such as: Infiniband, RoCEv2, lossless Ethernet technologies (PFC, ECN, etc), accelerated computing, GPU, DPU, etc.
- Familiarity with GPU resource scheduling managers (Slurm, Kubernetes, etc.)
- Expertise with NCCL/RCCL, setting up GPU environments, tuning these environments, and collecting benchmark results.
- Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK Stack) and Telemetry (gRPC, gNMI, OTLP, etc).
- Understanding of data center operations fundamentals in networking, cooling, and power
- Proven experience with one or more Tier-1 Clouds (AWS, Azure, GCP or OCI) or emerging Neoclouds, and cloud-native architectures and software.
- Understanding the AI workload requirements and how it interacts with other parts of the system like networking, storage, deep learning frameworks, etc.
- Knowledge of AI/ML frameworks (e.g., TensorFlow, PyTorch) and associated tooling is an advantage.
DriveNets is an equal opportunity employer. We do not discriminate based on upon race, religion, color, national origin, sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with disability, or other applicable legally protected characteristics.
More About DriveNets
Based in Israel with locations in Romania, US and Japan as well as extended teams, DriveNets operations cover more than 10 countries. With recognition by industry analysts and through numerous industry awards, DriveNets is pushing market momentum, allowing for faster service innovation from the network core to the edge. Visit our website:
Recommended Jobs
Behavior Technician Childcare Experience Needed
Start a fulfilling journey at Autism Learning Partners, supporting children and teens with autism. Build your career while making a positive impact every day! The Basics Compensation: $22-$27.25/…
Line Lead, Packaging
Job Description Job Description Job Description Date 07/2019 Location 1300 Airport Road, North Brunswick NJ Title Line lead, Packaging Department Packaging Reports to …
Field Logistics Administrator
Job ID: 507157 Safety, Integrity, Quality are the foundation or core values on which Tilcon New York Inc., a CRH company operates. Located in New York and New Jersey, Tilcon New York Inc. …
CPT I - Clinical Lab
Overview At Saint Clare's Health, our dedicated team of professionals is committed to our core values of quality, compassion, and community. As a member of Prime Healthcare, Saint Clare's Health …
DSW Store Associate
Find Your Fit at DSW We have a personal relationship with our shoes! You might say we’re even shoe obsessed. Our shoes tell the world who we are, make us feel great and inspire us to be our authe…
Class A CDL Delivery Driver
Job Description Job Description Skilled Class A CDL Delivery Drivers Wanted! We're W.B. Mason , and we're hiring hard-working, safety-oriented people to join us full-time . If you're a ca…
Mechanic
Company Overview Haddad Plumbing and Heating Inc. has Been in Business for 25 Years Servicing New Jersey, New York City, and Westchester County in Mid-Rise and High-Rise buildings with Exception…
NJ CHHA - Certified Home Health Aide - BILINGUAL (Spanish) Live-In
Gratitude Home Care of New Jersey has an IMMEDIATE need for a SPANISH SPEAKING BILINGUAL Home Health Aide to care for a client in their private home as a Live-IN located in Norwood, NJ. Are you passio…
Executive Director( Association Management)
Job Description Job Description Association Headquarters is searching for an Executive Director to support our valued client partner. The Executive Director serves as the chief executive office…
Maintenance Technician 2
Job Type Full-time Description Experience. Reputation. Excellence. Longevity. These are the hallmarks of Russo Development LLC . We are one of the most active, privately held developers of …