LLaVA-NeXT Why is LLaVA-OV interleaved inference and video inference configuration different?

Why is LLaVA-OV interleaved inference and video inference configuration different?

Open chancharikmitra opened this issue 1 year ago • 0 comments

Hello, thanks for contributing a very exciting model! I noticed that the interleaved and video inference examples given in the notebooks are set up with different configs as the model is set up. Why is that?

I have tried both configs on the other data type (i.e the interleaved configs for videos and vice versa). There doesn't seem to be much difference on my small set of experiments. But I would like to check and see if there is some detail I am missing.

So can I go ahead and use the simpler video inference configs shown in the .ipynb.

Sep 16 '24 21:09 chancharikmitra

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Why is LLaVA-OV interleaved inference and video inference configuration different?

LLaVA-NeXT
LLaVA-NeXT copied to clipboard