Jee Jee Li issues

Results 14 issues of


                                            Jee Jee Li

[Kernel][RFC] Refactor the punica kernel based on Triton

FILL IN THE PR DESCRIPTION HERE # Motivation LoRA is highly favored within the vLLM community, there are numerous LoRA-related issues and pull requests. Thanks for @Yard1 great work, we...

[Bug]: fused_moe_kernel compile bug

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.6...

bug

[CI/Build]Reduce the time consumption for LoRA tests

The current CI LoRA tests is quite time-consuming, which hampers the development of LoRA-related features. Based on testing on my local `single 3090`, the three most time-consuming tests are: |case|time|...

ready

[Model][LoRA]LoRA support added for MiniCPMV2.5

Attempt to advance the task of ` VLM with Lora` as described in [#330](https://github.com/vllm-project/vllm/issues/4194#issue-2252314187), choosing [mini-cpmv2.5](https://github.com/vllm-project/vllm/blob/v0.5.4/vllm/model_executor/models/minicpmv.py#L811) as the implementation target. * [ ] Add unit tests * [ ] Analyze...

ready

TypeError: ChatGLMTokenizer._pad() got an unexpected keyword argument 'padding_side'

### System Info / 系統信息 torch==2.4.0 transformers==4.45.0 ### Who can help? / 谁可以帮助到您？ _No response_ ### Information / 问题信息 - [X] The official example scripts / 官方的示例脚本 - [ ]...

[V1][Misc] Avoid unnecessary log output

When I test the llama model using V1, it outputs the following information, which I believe should only appear for multimodal models ```text WARNING 02-14 12:07:01 registry.py:340] `mm_limits` has already...

ready

[Misc] Reduce LoRA-related static variable

## Motivation Remove LoRA-related static variable `supported_lora_modules`, which not only makes our model implementation cleaner but also enables smoother LoRA support ## Work - [ ] Delete all models `supported_lora_modules`...

[Misc] Qwen2.5 VL support LoRA

documentation

[Bugfix] Fix quantization skip modules logic

## Motivation Some models, such as QWEN25-VL, have modified their layer hierarchy compared to their original `transformers` implementation. This change causes quantization's skip modules to become ineffective, leading to incorrect...

[Misc] Add Phi4-MM example

documentation