GPU Kernel Engineer (CUDA / Triton / Pallas)

GPU Kernel Engineer (CUDA / Triton / Pallas)

$180,000 - $250,000

$180,000 - $250,000

3+ years experience

3+ years experience

Apply Now

Apply Now

About The Role

RadixArk is looking for a GPU Kernel Engineer to develop high-performance kernels that power SGLang and our next-generation LLM serving infrastructure. Relevant experience includes implementing CUDA, Triton, or Pallas kernels for transformer architectures, with a focus on attention mechanisms, memory optimization, and inference/training runtime integration.

Requirements

  • 3+ years experience writing CUDA, Triton, or Pallas kernels for production systems

  • Bachelor's or Master's degree in Computer Science, Electrical Engineering, or equivalent industry experience

  • Deep understanding of GPU memory hierarchy, tiling strategies, vectorization, and warp-level programming

  • Proficiency in Python and C++ with demonstrated ability to write high-performance, production-quality code

  • Experience with LLM inference/training frameworks (vLLM, SGLang, TensorRT-LLM, TGI) is highly valued

  • Proven track record optimizing attention mechanisms, KV cache operations, MoE routing, or quantization kernels

Responsibilities

  • Implement high-performance kernels: attention, KV cache, MoE, matmul, quantization ops, communication ops

  • Profile and optimize kernels, and integrate kernels for inference and training runtimes used in production

  • Collaborate with compiler and systems engineers to achieve end-to-end performance wins

  • Benchmark kernel performance across GPU architectures (H100, H200, B200) and establish optimization standards

  • Contribute to SGLang and other open-source projects with kernel improvements and technical documentation

  • Conduct design and code reviews focused on performance, correctness, and maintainability

  • Create testing frameworks for kernel robustness and numerical stability

About RadixArk

RadixArk is an infrastructure-first company built by engineers who've shipped production AI systems at xAI, created SGLang (20K+ GitHub stars, the fastest open LLM serving engine), and developed Miles (our large-scale RL framework). We're on a mission to democratize frontier-level AI infrastructure by building world-class open systems for inference and training. Our team has optimized kernels serving billions of tokens daily, designed distributed training systems coordinating 10,000+ GPUs, and contributed to infrastructure that powers leading AI companies and research labs. We're backed by well-known investors in the infrastructure field and partner with Google, AWS, and frontier AI labs. Join us in building infrastructure that gives real leverage back to the AI community.

Compensation

We offer competitive compensation with significant founding team equity, comprehensive health benefits, and flexible work arrangements. The US base salary range for this full-time position is: $180,000 - $250,000 + equity + benefits. Our salary ranges are determined by location, level, and role. Individual compensation will be determined by experience, skills, and demonstrated expertise in GPU computing and ML systems.

Equal Opportunity

RadixArk is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

See other positions

Copyright. RadixArk @2025

contact@radixark.ai

Copyright. RadixArk @2025

contact@radixark.ai