LLaVA-NeXT
LLaVA-NeXT copied to clipboard
As llama3-llava-next-8b and LLaVA-NeXT-Video-7B-DPO seem to have the same interface, is it possible to make llama3-llava-next-8b process multiple frames of one video per single forward? Basically, I don't get the...
Don't see any. Would pay!
有关于第二阶段 790K 训练数据的说明吗?包含哪些数据?或者有开放这些数据的计划吗?
The [announcement blog post](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/) indicates inference can be done with sglang, but attempting to load the 7b model with the sglang backend: ``` python -m sglang.launch_server --model-path ~/models/lmms-lab_LLaVA-NeXT-Video-7B-DPO --port 30000...
To my knowledge, the videos in NExTQA dataset are relatively short, with an average video length of 44 seconds, and there is a noted static bias[1] in the ActivityNet QA...
The [checkpoint link](https://huggingface.co/collections/lmms-lab/llava-next-6623288e2d61edba3ddbf5ff) for LLaVA-NeXT (Stronger) seems to be broken
Thanks for your work on this! When will fine-tuning scripts be made available?
Hello. Thank you for your great works. I faced the issue that File "/home3/user/mllm/LLaVA_NeXT/llava/mm_utils.py", line 377, in __call__ if output_ids[0, -keyword_id.shape[0] :] == keyword_id: RuntimeError: Boolean value of Tensor with...
Hi, was just testing to see if I could reform the same results from your demo as in an import code. I was attempting to prompt two images and then...
Thank you for the great job! I am just a bit curious about the computing resources used for LLaVA-NeXT.