Jee Jee Li comments

Results 209 comments of


                                            Jee Jee Li

[Kernel][RFC] Refactor the punica kernel based on Triton

> @jeejeelee thanks, this is looking good. Can you add a comment to libentry code that it can be removed with triton 3.0.0? Thank you for your review. I have...

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

Coulf you plz provibe your running script?

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

@noooop Thank you for providing this very useful information. I will verify the NaN output ASAP

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

@noooop @manitadayon I can reproduce this issue by using Qwen1.5-14B-Chat-GPTQ, and now I've implemented a temporary solution which could fix this issue locally, please see: https://github.com/jeejeelee/vllm/blob/qwen2-overflow-clamp/vllm/model_executor/models/qwen2.py#L237-L246. The code snippet used...

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

@noooop I agree, I just want to give you feedback on the results of my testing

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

@mgoin Could you plz look at this thread, thanks

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

@mgoin Thanks for your response, could you test with TP=2? I tested locally and TP=1 produced reasonable results. If I remember correctly, we downloaded the model from [Qwen1.5-14B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-14B-Chat-GPTQ-Int4)

Jee Jee Li

[Kernel][RFC] Refactor the punica kernel based on Triton

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

VLLM for Qwen 2.5 72B produces all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! outputs, regardless of prompt given GPTQ 4 bits quantization

[V1] LoRA - Add triton kernels for V1

[V1] LoRA - Add triton kernels for V1

[Bug]:Qwen2.5vl vllm serve Engine process failed to start