Jee Jee Li comments

Results 206 comments of


                                            Jee Jee Li

[Bugfix] Add custom Triton cache manager to resolve MoE MP issue

> After reading the conversation [here](https://github.com/vllm-project/vllm/issues/6126#issuecomment-2208852102), it sounds like we would also need to set this env variable accordingly when using Triton punica kernel (e.g., once we merge [this](https://github.com/vllm-project/vllm/pull/5036) PR)....

[Bug]: using qwen-8B , LLVM ERROR: Failed to compute parent layout for slice layout

Have you tried v0? ```bash VLLM_USE_V1=0 vllm serve .... ```

[Bug]: using qwen-8B , LLVM ERROR: Failed to compute parent layout for slice layout

Similar issue: https://github.com/vllm-project/vllm/issues/17392

[Bug]: RuntimeError: CUDA error: an illegal memory access was encountered. Qwen2.5-VL

I'll try to reproduce and address this issue.

[Bug]: RuntimeError: CUDA error: an illegal memory access was encountered. Qwen2.5-VL

Can you try #17370 , it should fix this issue

[Bug]: RuntimeError: CUDA error: an illegal memory access was encountered. Qwen2.5-VL

Could you try https://github.com/vllm-project/vllm/pull/17435? Please rebuild from source

[Bug]: RuntimeError: CUDA error: an illegal memory access was encountered. Qwen2.5-VL

If torch 2.7.0 is used, this problem should not be encountered

[Feature]: SLora hot loading

See: https://docs.vllm.ai/en/latest/features/lora.html#dynamically-serving-lora-adapters

[Model] Activated LoRA

This PR has been open for quite a while, and it seems no one is interested in this

[Model] Activated LoRA

@tdoublep Thank you for contribution. Considering that LoRA has many variants, we can probably only maintain and support some commonly used features. I'm not sure whether we should consider this...