vllm
vllm copied to clipboard
[CPU] V1 support for the CPU backend
trafficstars
resolve #16056
Support all features listed in the CPU doc excepts FP8 KV cache.
Changes
- Add V1
CPUWorkerandCPUModelRunner, derived fromWorkerandGPUModelRunnerto reduce code duplication. - Add V1
TorchSDPABackendwith compatible interfaces forGPUModelRunner, such asreorder_batchandbuild. - Additional changes in
GPUModelRunnerto avoid importing flash-attn explicitly and using defaultnccldist backend.