LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Results 344 LLaVA-NeXT issues
Sort by recently updated
recently updated
newest added

I use google/siglip-so400m-patch14-384 and LLaVA-Video-7B-Qwen2, however I met a problem that is "RuntimeError: Error(s) in loading state_dict for CLIPVisionModel: size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([729, 1152])...

I use llava-onevision-72B, when I run the inference on H800,it is OK. When I run inference on A6000, it occurs the error above? How can I solve it

Hi, folks, i just wonder does anyone had trained the llava-interleave before **by using LLaVA-NeXT repo** ? Currently, i have prepared m4 interleaved dataset and download both image & video...

Great work, and thank you for open-sourcing it! I have a few questions:​​ ​​First, since I currently need to use multiple images per sample, can the lmms-lab/M4-Instruct-Data format JSON file...

Hello, when I tested [LLaVA-OneVision](https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov) and [LLaVA-Video](https://huggingface.co/lmms-lab/LLaVA-Video-7B-Qwen2), I found that the results of LLaVA-OneVision were unexpectedly poor. Is there anything I did not set correctly? The prompt of LLaVA-OneVision is:...

Hi! I'm trying to adjust the generation configuration of the model, specifically setting the temperature parameter to 0.7. However, I noticed that even after setting it, the generated outputs don't...

Previous link yielded in a 404 - page not found. "The main branch of LLaVA-NeXT does not contain the path [./docs/LLaVA-NeXT.md](https:/github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision.md)."

Hi, I notice that you have commented out `encode_multimodals` (https://github.com/LLaVA-VL/LLaVA-NeXT/blob/09e5840d5589ad2d6a8656c0a60f21ae134b3309/llava/model/llava_arch.py#L291C32-L291C55). If I understand correctly, using slow-fast features would require using `self.encode_multimodals` and not `self.encode_images`. Could you clarify this?

Hi, I am trying to fine-tune LLaVA-Video-7B and LLaVA-Video-72B. However, it seems that the checkpoints for mm_projector.bin have not been released yet. Thanks for the help in advance.

Thank you for sharing the great work. I ask for some mismatch between the current codebase and arXiv technical report. 1. SlowFast mode * Is slowfast representation only used for...