LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Why is LLaVA-OV interleaved inference and video inference configuration different?
Hello, thanks for contributing a very exciting model! I noticed that the interleaved and video inference examples given in the notebooks are set up with different configs as the model is set up. Why is that?
I have tried both configs on the other data type (i.e the interleaved configs for videos and vice versa). There doesn't seem to be much difference on my small set of experiments. But I would like to check and see if there is some detail I am missing.
So can I go ahead and use the simpler video inference configs shown in the .ipynb.