vllm
vllm copied to clipboard

Published 20 hours ago •

Reame
Issues

[CPU] V1 support for the CPU backend

Open bigPYJ1151 opened this issue 7 months ago • 9 comments

trafficstars

resolve #16056

Support all features listed in the CPU doc excepts FP8 KV cache.

Changes

Add V1 CPUWorker and CPUModelRunner, derived from Worker and GPUModelRunner to reduce code duplication.
Add V1 TorchSDPABackend with compatible interfaces for GPUModelRunner, such as reorder_batch and build.
Additional changes in GPUModelRunner to avoid importing flash-attn explicitly and using default nccl dist backend.

Apr 11 '25 01:04 bigPYJ1151

Labels

documentation

ci/build

v1

Owner

Other Repo Issues