LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Thank your for your kindly release! But when i looking at the annotations of M4-Instruct, the FIRST sample just quite confused me. Here is the snapshot:  The human and...
Following the SGLang instructions on the README.md ``` [email protected]:~/sglang$ bash examples/usage/llava_video/srt_example_llava_v.sh K 0 /root/sglang/examples/usage/llava_video/videos/Q98Z4OTh8RwmDonc.mp4 /root/models/LLaVA-NeXT-Video-7B-DPO 16 examples/usage/llava_video Each video you will sample 16 frames Number of GPUs in GPULIST: 8...
I'm trying out the qwen model on the democode for interleave. It seems that beyond 13 images (inside one prompt) the model just outputs an empty response (always). Also when...
Hello, great work! I have a question I'd like to ask - what prompts were used when evaluating MMMU and Mathvista? I'd appreciate if you could provide information on the...
There is no multi-image for llava-onevision-data, https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data; Is the data complete?
Hi @ZhangYuanhan-AI Thanks for the wonderful job. Just a question about the evaluation of the detailed description. I found the result of the gpt eval score will be convert to...
Thanks for the excellent job. LLaVAs utilize a subset of training data from each source, could you please share some insights/techniques about sampling/filtering the original data apart from finding high-quality...
Hi, I find the number of image and image placeholder inconsistent in some instances of the M4-Instruct-Data. For example, there are two image placeholders and 4 image paths, which is...
Great project, appreciate it highly :) To give something back (not much, but may help some beginners to get started) here is my code for using interleave without gradio implemented...