[Feature][P0]: Switch to Runtime Base Image
🚀 The feature, motivation and pitch
Description
The Dockerfile currently uses nvidia/cuda:12.9.1-devel-ubuntu22.04 as the final base image. The devel variant includes the full CUDA compiler toolchain (~7GB) which is only needed during build, not at runtime. Switching to the runtime variant will significantly reduce image size.
What You'll Do
- Change
FINAL_BASE_IMAGEfromdeveltoruntime(line 24) - Analyze if any runtime components actually need build tools
- Handle FlashInfer JIT compilation requirements:
- Test if AOT wheels work without build deps
- If needed, add conditional minimal build tools
- Verify all GPU functionality works with runtime image
- Update documentation
Deliverables
- [ ] Modified Dockerfile with runtime base image
- [ ] Conditional build dependency installation for FlashInfer (if needed)
- [ ] GPU functionality test results
- [ ] Before/after image size comparison
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Given DeepGEMM seems to be jitting kernels, would that not require it to AOT compile kernels as well?
Either way, this would have to be merged first: https://github.com/vllm-project/vllm/pull/26966 I'll fix up merge conflicts as soon as v0.11.1 is out, as I was told to hold off until then
@bbartels I'm going to split the aot vs jit out of this ticket. For now simply switching the base image to runtime and installing the tools and headers we need explicitly should still save us a bunch of space. Something like https://github.com/vllm-project/vllm/pull/28727
@bbartels I'm going to split the aot vs jit out of this ticket. For now simply switching the base image to runtime and installing the tools and headers we need explicitly should still save us a bunch of space. Something like https://github.com/vllm-project/vllm/pull/28727
Sounds good, I'll fix up the source compilation pr later today. That should save some space as well!
https://github.com/vllm-project/vllm/pull/28727 is ready for review and saves 3 GB of space.