LLaVA-NeXT
LLaVA-NeXT copied to clipboard
LLaMA3-8B video inference
As llama3-llava-next-8b and LLaVA-NeXT-Video-7B-DPO seem to have the same interface, is it possible to make llama3-llava-next-8b process multiple frames of one video per single forward?
Basically, I don't get the idea of what makes LLaVA-NeXT-Video a processor for multiple frames and can't find the related code. According to the blog post processing patches versus frames is the difference, so the initial question arises.