LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Fix: videos in LLaVa-OV

Open zucchini-nlp opened this issue 1 year ago • 0 comments

Currently running the demo notebook for LLaVA OneVision for video modality doesn't apply pooling for all video patches/frames, because the modality list holds values for each prompt, while videos can contain several frames. This PR replicates the modality list by copying it for all video frames in the demo notebook

I tried to see if we can expand the modalities inside modeling code, but seems like it's hard to infer which visual in the input is image or video, so I decided to delegate expansion to users.

zucchini-nlp avatar Aug 30 '24 09:08 zucchini-nlp