LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Interleave demo limited to 13 images?

Open uahic opened this issue 1 year ago • 0 comments

I'm trying out the qwen model on the democode for interleave. It seems that beyond 13 images (inside one prompt) the model just outputs an empty response (always). Also when asking the model how many image token I did provide it gives wrong answers (but gets the content of the images right). E.g. when providing 6 images, it tells you there are 10. Maybe this is due to training but maybe something in the codebase is wrong

I've tested this with the most recent commit (commit e96b45bf69f578b87b1840c2cbbc617507457103), the 'xU25MMA2N4aVtYay.mp4' video and sampling from that, using your sample_frames(...) function

Edit: I just see that the old interleave script has gone (I'm using my modified and gradio-free copy of the previous commits) and that the new video demo script uses only one token. Huh.

uahic avatar Aug 08 '24 10:08 uahic