Li, Jiang

Results 9 issues of Li, Jiang

This PR adds a new CPU backend to vLLM and supports the basic model inference feature, with BF16 and FP32 dtype. FP16 support and TP support will be added in...

Intel CPU

Hi, vLLM genius @WoosukKwon @zhuohan123. Motivated by some requirements to execute vLLM on the CPU (e.g., #176 ), we recently implemented an initial prototype for CPU-only execution on the x86...

Intel CPU

## Progress - [ ] Integrate CPU executor to support the basic model inference (BF16/FP32) without TP. - #3634 - #3824 - #4113 - [ ] Support FP16 model inference....

RFC
x86 CPU

FILL IN THE PR DESCRIPTION HERE FIX #xxxx (*link existing issues this PR will resolve*) **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** --- PR...

For Trino: - ShortTimestamp (a Long member, 64 bits) - LongTimestamp (a Long member and an Int member, 96 bits) For Velox: two Long members (128 bits) ```Timestamp(Precision)``` type signature...

This PR enabled vLLM multiprocessing in CPU backend for improving async LLM engine performance and supporting TP. The main changes include: - Use utilities from ```vllm.executor.multiproc_worker_utils``` to manage workers in...

x86 CPU

This PR provides corresponding CPU kernels of the compressed-tensor INT8 W8A8, based on oneDNN, to enable lowering compressed-tensor operations to CPU device. Both of the static and dynamic mode are...

x86 CPU
ready

Generate custom activation ops using ```torch.compile``` for CPU backend. Main changes to vLLM: - ~~Add ```_forward_native_impl``` to each custom ops to avoid recompilation caused by tracing ```self```.~~ For vicuna-7b-v1.5, there...

x86 CPU

Upgrade CPU backend torch to 2.6.0, all tests are verified on local. Waiting for #12721

ci/build