Jee Jee Li
Jee Jee Li
> @jeejeelee thanks, this is looking good. Can you add a comment to libentry code that it can be removed with triton 3.0.0? Thank you for your review. I have...
Coulf you plz provibe your running script?
@noooop Thank you for providing this very useful information. I will verify the NaN output ASAP
@noooop @manitadayon I can reproduce this issue by using Qwen1.5-14B-Chat-GPTQ, and now I've implemented a temporary solution which could fix this issue locally, please see: https://github.com/jeejeelee/vllm/blob/qwen2-overflow-clamp/vllm/model_executor/models/qwen2.py#L237-L246. The code snippet used...
@noooop I agree, I just want to give you feedback on the results of my testing
@mgoin Could you plz look at this thread, thanks
@mgoin Thanks for your response, could you test with TP=2? I tested locally and TP=1 produced reasonable results. If I remember correctly, we downloaded the model from [Qwen1.5-14B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-14B-Chat-GPTQ-Int4)
All the LoRA tests have failed again
It seems these modifications have significantly increased the time consumption for lora testing 
Can you try https://github.com/vllm-project/vllm/pull/17370?