Back to Jobs

Software Engineering / AI Infrastructure

Baseten San Francisco, CA or New York, NY
Posted 3 months ago
Deadline: Not specified
Full Time Mid-Level Software Engineering

Baseten powers AI inference for some of the world’s leading AI-driven companies, including Sourcegraph, Writer, Gamma, and OpenEvidence. Backed by top investors and a recent $150M Series D, Baseten is scaling its engineering team to meet the growing demand for high-performance model deployment infrastructure. The company seeks a Software Engineer - Model APIs to join the Model Performance team. In this role, you will design, build, and optimize the core infrastructure that enables scalable and efficient model serving across distributed systems. You’ll work on performance-critical components from TensorRT-LLM kernels to API reliability and collaborate across teams to deliver developer-friendly solutions that set new standards for inference performance. This is a rare opportunity to shape how the world’s most dynamic AI organizations deploy and scale large language models in production. If you’re a systems-oriented engineer with a passion for performance, distributed systems, and real-world AI infrastructure, Baseten wants to hear from you.

 

Application accepted until position is filled

Requirements

1. 3+ years of experience building or maintaining distributed systems or large-scale APIs.
2. Proven track record in operating low-latency backend services (auth, quotas, metering, rate limiting).
3. Strong skills in profiling, tracing, and optimizing system performance.
4. Ability to debug complex systems across runtime and GPU layers.
5. Excellent written and verbal communication with clear documentation skills.
6. (Preferred) Experience with LLM runtimes (vLLM, SGLang, TensorRT-LLM).
7. Familiarity with Kubernetes, service meshes, or distributed scheduling systems.
8. Background in developer-facing infrastructure or open-source API systems.

Benefits

1. $150K–$230K salary with generous equity grants.
2. Inclusive, growth-focused hybrid work environment.
3. Work with top AI startups and cutting-edge model infrastructure.
4. Comprehensive compensation and performance-based rewards.
5. Opportunities for professional development and learning in AI systems engineering.

Responsibilities

1. Design, build, and operate the Model API layer for high-performance inference.
2. Optimize CUDA and TensorRT-LLM kernels for speed and throughput.
3. Implement speculative decoding, quantization, and multi-GPU communication patterns.
4. Develop benchmarking frameworks to test performance across models and hardware.
5. Introduce deep observability using metrics, traces, and logging.
6. Build foundational API features like versioning, metering, and authentication.
7. Collaborate cross-functionally to deliver scalable and developer-friendly AI serving tools.
8. Drive innovation in inference performance and platform reliability.

Company Size
Employment Type
Full Time
Work Mode
On-site (San Francisco, CA or New York, NY)
Apply Externally
Notice: You are about to leave RemoteWok and apply on an external site.
The application process will continue on the employer's website.
View Company Profile

Location

San Francisco, CA or New York, NY