Kuntai Du issues

Results 9 issues of


                                            Kuntai Du

[RFC]: Implement disaggregated prefilling via KV cache transfer

### Motivation. There are more and more use cases, where we need to transfer KV caches between vLLM instances, or store KV caches for future use. Some concrete use cases:...

RFC

[Core] implement disaggregated prefilling via KV cache transfer

This is a follow-up PR for #5557 . Goal: implement disaggregated prefilling by launching 2 vllm instances (one for prefilling, one for decoding), and forward the KV cache from prefilling...

[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy

Following PR #5073, this PR aims to compare `vllm` and alternatives (like tgi, tensorrt-llm and lmdeploy --- feel free to comment if you feel there are also other alternatives we...

perf-benchmarks

nightly-benchmarks

Finding protobuf files while benchmarking TensorRT-LLM

### System Info I am working on the benchmarking suite in vLLM team, and now trying to run TensorRT-LLM for comparison. I am relying on this github repo (https://github.com/neuralmagic/tensorrt-demo) to...

bug

Failed to launch triton server, the tensorrt_llm protobuf file failed to load

### System Info Docker image: nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 Device: 8x H100 trt-llm backend: v0.11.0 ### Who can help? @byshiue @schetlur-nv ### Information - [ ] The official example scripts - [X] My...

bug

[Performance]: reproducing vLLM performance benchmark

### Proposal to improve performance _No response_ ### Report of performance regression _No response_ ### Misc discussion on performance To reproduce vLLM's performance benchmark, please launch a shell in the...

performance

[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang

Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** --- PR Checklist (Click to Expand) Thank...

perf-benchmarks

nightly-benchmarks

[Core] Deprecating block manager v1 and make block manager v2 default

This PR deprecates block manager v1 and makes block manager v2 the default to simplify the code path. This is supported by this [benchmark](https://docs.google.com/document/d/1XxYUFai07ta5rE7OdtCVhLJ5J0oAxEqrGgarFdjv0Zc/edit?usp=sharing), where block manager v2 is 500...

ready

[Core] Implementing disaggregated prefilling, and caching KV cache in CPU/disk/database.

TL; DR: implemented disaggregated prefill with ** 500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with...

ready