AmazDeng
AmazDeng
> We don't support this model yet. According to download statistics from Hugging Face, its popularity has been declining for the past half month. @AdamzNV Please take a look at...
> Hi [@AmazDeng](https://github.com/AmazDeng) we just enabled internvl2 supporting. do u still have further issue or question now? If not, we'll close it soon. @nv-guomingz Thank you for your work. Could...
> https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/multimodal#internvl2 @nv-guomingz I checked the documentation, and tensorrt-llm only supports InternVL2-1B to InternVL2-26B. So, does tensorrt-llm support InternVL2-40B?
@kcz358 @ZhangYuanhan-AI @Luodian Could you please take a look at this issue?
@Luodian @ZhangYuanhan-AI @kcz358 Thank you for your response. I need to point out that the reason for the excessively high GPU memory usage during video inference is that after the...
> image_features.append(self.get_2dPool(image_feat)) The code you referenced is located in the encode_multimodals method. However, in the main branch of the llava_arch.py code, encode_multimodals is commented out. @kcz358
> These lines contain the processing logic, not the encode_multimodals. > > https://github.com/LLaVA-VL/LLaVA-NeXT/blob/3fbf54b4dbd72a060104253e6f08168df48f6625/llava/model/llava_arch.py#L232-L236 I did as you said and replaced "process_images" with "image_processor". I printed out the shape after the...
> The problem is actually you are still processing the video with incorrect logic even though you are using `image_processor` to process images. The video frames are treated as multiple...
Additionally, I also tried passing the do_sample=False parameter; the inference didn't throw any errors, but it was ineffective, as the results within a batch were not exactly identical.
I ran the batch inference loop ten times. I found that although the results within each batch varied, the results between batches were exactly the same. So, where is the...