vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[CPU] V1 support for the CPU backend

Open bigPYJ1151 opened this issue 7 months ago • 9 comments
trafficstars

resolve #16056

Support all features listed in the CPU doc excepts FP8 KV cache.

Changes

  • Add V1 CPUWorker and CPUModelRunner, derived from Worker and GPUModelRunner to reduce code duplication.
  • Add V1 TorchSDPABackend with compatible interfaces for GPUModelRunner, such as reorder_batch and build.
  • Additional changes in GPUModelRunner to avoid importing flash-attn explicitly and using default nccl dist backend.

bigPYJ1151 avatar Apr 11 '25 01:04 bigPYJ1151