[BUG] number of image start tokens and image end tokens mismatch
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- [X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
When the length of image_start_tokens and the length of image_end_tokens are not equal, valid_image_nums will be the greater one, causing torch.hstack to fail due to tensor size mismatch. Should max be min?
https://huggingface.co/openbmb/MiniCPM-V-2_6/blob/main/processing_minicpmv.py#L119
期望行为 | Expected Behavior
No response
复现方法 | Steps To Reproduce
run the video example with video_path="./assets/demo_video.mp4"
https://github.com/OpenBMB/MiniCPM-V?tab=readme-ov-file#chat-with-video
运行环境 | Environment
- OS: Ubuntu 20.04
- Python: 3.10
- Transformers: 4.40.0
- PyTorch: 2.1.2
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 11.8
备注 | Anything else?
No response
hello, Maybe your model length setting is too small and the video is too long.
hello, Maybe your model length setting is too small and the video is too long.
@LDLINGLINGLING yes, downsampling can solve this, but I still think L119 is incorrect since it will stack two tensors of different lengths.
Where to set the model length ?
Where to set the model length ?
set MAX_NUM_FRAMES=40 #64 # if cuda OOM set a smaller number when inference videos
maybe 40 is the largest, due to the max_num of tokens is 8192