Jee Jee Li

Results 209 comments of Jee Jee Li

> @jeejeelee thanks, this is looking good. Can you add a comment to libentry code that it can be removed with triton 3.0.0? Thank you for your review. I have...

@noooop Thank you for providing this very useful information. I will verify the NaN output ASAP

@noooop @manitadayon I can reproduce this issue by using Qwen1.5-14B-Chat-GPTQ, and now I've implemented a temporary solution which could fix this issue locally, please see: https://github.com/jeejeelee/vllm/blob/qwen2-overflow-clamp/vllm/model_executor/models/qwen2.py#L237-L246. The code snippet used...

@mgoin Thanks for your response, could you test with TP=2? I tested locally and TP=1 produced reasonable results. If I remember correctly, we downloaded the model from [Qwen1.5-14B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-14B-Chat-GPTQ-Int4)

All the LoRA tests have failed again

It seems these modifications have significantly increased the time consumption for lora testing ![image](https://github.com/user-attachments/assets/be6080be-abc4-4c3e-b913-c9b1fa2de95f)

Can you try https://github.com/vllm-project/vllm/pull/17370?