Yang Fan
Yang Fan
> Thanks for implementing this (and sorry for the delayed response)! Since this PR not only introduces a new modality (video) but also involves the first model to accept multiple...
Hi @DarkLight1337 @ywang96 , I have updated this PR based on your review comments, please check it again. I also add some notes about multiple modalities in the PR overview.
> @fyabc Hi, can this patch support mutiple images in one prompt like follows: > > ``` > Compute the value of the expression in the image below \nby using...
> I did some local testing on this PR and it's working well for both `.jpg` and `.mp4` inputs, on TP=1 and TP=2. > > Note that I did run...
> The use of `qwen-vl-utils` is quite different from the existing models which fully rely on the `AutoProcessor` from HuggingFace. Is there a particular reason why the preprocessing logic for...
> > I happened to notice something while following this PR. I've merged it locally to run some tests, and during testing, I encountered a strange issue where, after several...
> support qwen2 vl awq model? @seanzhang-zhichen Yes, this PR support AWQ model. You can check [this issue](https://github.com/QwenLM/Qwen2-VL/issues/20) for more details.
> I found that GPTQ quantization will prompt the following error. If I skip these weight readings according to the qwen2 code, it will run. Is it reasonable to merge...
> Huge thanks for the PR! We are very happy to support this model very soon :) > > I just left some comments mostly requiring some clarifications about mrope....
Hi @DarkLight1337 @ywang96 @WoosukKwon , I have updated this PR according to review comments, please check it again: 1. Add `xformers` backend of `Qwen2VisionAttention`, remove the dependency on `flash-attn`. 2....