Orr Zohar
Orr Zohar
**Describe the bug** I am training a video-llm model, where I encode log videos with a varying number of forward passes to avoid OOM issues. I would like to use...
**Is your feature request related to a problem? Please describe.** it is very difficult to train MM models (e.g., multi-image chat/video chat) models in zero3 because the effective ``vision batch``>>``text...
Hi, For the InternVideo2-S/B/L encoders: what value was used for `sep_image_video_pos_embed`? It seems like this was set true in the 1b/6b models, but false in S/B/L I am trying to...
Hi, When you do Sequence Paralle -- you are padding with token id 2 = '#' https://github.com/NVlabs/VILA/blob/2b43308f25e63161a172fe9a38e3a04e2fcd12ef/llava/data/dataset.py#L1372-L1389 Could you let me know why you are padding with this instead of...
Hi, Will you make open-source/can you share the raw evaluations for proprietary models? Best, Orr
Hi, Will you make open-source/can you share the raw evaluations for proprietary models? Best, Orr