Diffusion Inference Engineer

Diffusion Inference Engineer

$180,000 - $250,000

$180,000 - $250,000

3+ years experience

3+ years experience

Apply Now

Apply Now

About The Role

RadixArk is looking for a Diffusion Inference Engineer to optimize high-performance serving systems for image and video generation models. You'll push the limits of inference efficiency for models like Flux-1, Flux-2, Wan 2.1/2.2, and next-generation architectures, integrating them into SGLang's serving infrastructure. This role focuses on making diffusion inference faster, cheaper, and more scalable in production.

Requirements

  • 3+ years experience building ML inference systems for generative models, computer vision, or large-scale serving

  • Bachelor's or Master's degree in Computer Science, Electrical Engineering, or equivalent industry experience

  • Strong understanding of diffusion models: sampling algorithms, noise schedules, latent diffusion, guidance techniques

  • Experience optimizing transformer-based (DiT/Flux) or U-Net architectures for inference

  • Proficiency in Python and PyTorch with production-quality code standards

  • Familiarity with model optimization techniques: quantization, flash attention, kernel fusion

  • Experience with CUDA, Triton, or GPU performance profiling is a plus

  • Understanding of VAEs, attention mechanisms, and multi-modal architectures

Responsibilities

  • Build and optimize high-performance serving systems for image and video generation models (Flux-1, Flux-2, Wan 2.1/2.2, Qwen Image Edit, Zed Image Turbo)

  • Implement efficient sampling algorithms: DDPM, DDIM, DPM-Solver, Euler, and custom schedulers

  • Optimize inference latency and throughput for text-to-image, image-to-image, and video generation workloads

  • Design memory-efficient serving architectures for high-resolution generation and long video sequences

  • Integrate diffusion models into SGLang with batching, caching, and scheduling optimizations

  • Profile and optimize model components: VAE encoding/decoding, DiT/U-Net forward passes, attention layers

  • Implement quantization and mixed-precision strategies (FP16, BF16, INT8) for production serving

  • Collaborate with kernel engineers to optimize attention, convolution, and sampling operations

  • Build benchmarks comparing inference performance across hardware (H100, B200, TPU) and configurations

  • Support multi-model serving pipelines: LoRA adapters, ControlNet, IP-Adapter integrations

  • Contribute optimizations to open-source diffusion serving frameworks

  • Write technical documentation on diffusion inference optimization and deployment best practices

About RadixArk

RadixArk is an infrastructure-first company built by engineers who've shipped production AI systems at xAI, created SGLang (20K+ GitHub stars, the fastest open LLM serving engine), and developed Miles (our large-scale RL framework). We're on a mission to democratize frontier-level AI infrastructure by building world-class open systems for inference and training. Our team has optimized kernels serving billions of tokens daily, designed distributed training systems coordinating 10,000+ GPUs, and contributed to infrastructure that powers leading AI companies and research labs. We're backed by well-known investors in the infrastructure field and partner with Google, AWS, and frontier AI labs. Join us in building infrastructure that gives real leverage back to the AI community.

Compensation

We offer competitive compensation with significant founding team equity, comprehensive health benefits, and flexible work arrangements. The US base salary range for this full-time position is: $180,000 - $250,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and demonstrated expertise in diffusion inference and ML infrastructure.

Equal Opportunity

RadixArk is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

See other positions

Copyright. RadixArk @2025

contact@radixark.ai

Copyright. RadixArk @2025

contact@radixark.ai