Li, Jiang issues

Results 9 issues of


                                            Li, Jiang

[Hardware][Intel] Add CPU inference backend

This PR adds a new CPU backend to vLLM and supports the basic model inference feature, with BF16 and FP32 dtype. FP16 support and TP support will be added in...

Intel CPU

[Feature] Prototype of vLLM execution on CPU-only devices

Hi, vLLM genius @WoosukKwon @zhuohan123. Motivated by some requirements to execute vLLM on the CPU (e.g., #176 ), we recently implemented an initial prototype for CPU-only execution on the x86...

Intel CPU

[RFC] Initial Support for CPUs

## Progress - [ ] Integrate CPU executor to support the basic model inference (BF16/FP32) without TP. - #3634 - #3824 - #4113 - [ ] Support FP16 model inference....

RFC

x86 CPU

[WIP] Add IPEX Paged Att.

FILL IN THE PR DESCRIPTION HERE FIX #xxxx (*link existing issues this PR will resolve*) **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** --- PR...

Timestamp type mismatching between Trino and Velox

For Trino: - ShortTimestamp (a Long member, 64 bits) - LongTimestamp (a Long member and an Int member, 96 bits) For Velox: two Long members (128 bits) ```Timestamp(Precision)``` type signature...

[Hardware] [Intel] Enable Multiprocessing in CPU backend and update documentation

This PR enabled vLLM multiprocessing in CPU backend for improving async LLM engine performance and supporting TP. The main changes include: - Use utilities from ```vllm.executor.multiproc_worker_utils``` to manage workers in...

x86 CPU

[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend

This PR provides corresponding CPU kernels of the compressed-tensor INT8 W8A8, based on oneDNN, to enable lowering compressed-tensor operations to CPU device. Both of the static and dynamic mode are...

x86 CPU

ready

[Hardware][Intel] Generate custom activation ops using torch.compile for CPU backend.

Generate custom activation ops using ```torch.compile``` for CPU backend. Main changes to vLLM: - ~~Add ```_forward_native_impl``` to each custom ops to avoid recompilation caused by tracing ```self```.~~ For vicuna-7b-v1.5, there...

x86 CPU

[CPU] Upgrade CPU backend to torch-2.6

Upgrade CPU backend torch to 2.6.0, all tests are verified on local. Waiting for #12721

ci/build