Jee Jee Li
Jee Jee Li
Currently, while the sgmv I've implemented can achieve high performance long sequence scenarios, it falls short compared to Punica's bgmv in cases involving small batches and short sequences. I'm working...
Upload the test result using YI-34B: 
Outstanding issues: - Full-shard LoRA support - Resolve the other LoRA's tests
The markers of the first integration method: https://github.com/jeejeelee/vllm/tree/00e007695c8cfa466f53fa74a0a601aa42a10cd7
> Will you merge this soon? Thank you for your attention. I'm not sure if we can merge yet, but I have completed most of the development work. You can...
@simon-mo Could you please check why the CI test failed? I have actually completed the unit tests locally and would like to see if there are any omissions.
@Yard1 Thanks for your review, I will fix these asap
> max_num_batched_tokens must be
# Libentry test The current version of Triton used in vLLM is 2.3.1, while the official Triton version is 3.0.0. Therefore, we tested the usage of libentry in these two...