LLaVA-NeXT issues

I met RuntimeError: Error(s) in loading state_dict for CLIPVisionModel

2

I use google/siglip-so400m-patch14-384 and LLaVA-Video-7B-Qwen2, however I met a problem that is "RuntimeError: Error(s) in loading state_dict for CLIPVisionModel: size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([729, 1152])...

royal-dargon

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

2

I use llava-onevision-72B, when I run the inference on H800，it is OK. When I run inference on A6000, it occurs the error above？ How can I solve it

eternal8080

What is the training scripts for llava-interleave ?

Hi, folks, i just wonder does anyone had trained the llava-interleave before **by using LLaVA-NeXT repo** ? Currently, i have prepared m4 interleaved dataset and download both image & video...

HuangChiEn

Regarding the feasibility of multi-image data training and LoRA training

2

Great work, and thank you for open-sourcing it! I have a few questions: First, since I currently need to use multiple images per sample, can the lmms-lab/M4-Instruct-Data format JSON file...

fragrantly0202

Some questions about the test results of LLaVA-OneVision (multi-images input)and LLaVA-Video (video input)

Hello, when I tested [LLaVA-OneVision](https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov) and [LLaVA-Video](https://huggingface.co/lmms-lab/LLaVA-Video-7B-Qwen2), I found that the results of LLaVA-OneVision were unexpectedly poor. Is there anything I did not set correctly? The prompt of LLaVA-OneVision is:...

zhousheng97

How to properly change generation config? Set temperature = 0.7, but output remains unchanged

Hi! I'm trying to adjust the generation configuration of the model, specifically setting the temperature parameter to 0.7. However, I noticed that even after setting it, the generated outputs don't...

ayiyayi

Fixed broken link

Previous link yielded in a 404 - page not found. "The main branch of LLaVA-NeXT does not contain the path [./docs/LLaVA-NeXT.md](https:/github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision.md)."

NicoSimo

slow fast features not being used in current code

2

Hi, I notice that you have commented out `encode_multimodals` (https://github.com/LLaVA-VL/LLaVA-NeXT/blob/09e5840d5589ad2d6a8656c0a60f21ae134b3309/llava/model/llava_arch.py#L291C32-L291C55). If I understand correctly, using slow-fast features would require using `self.encode_multimodals` and not `self.encode_images`. Could you clarify this?

sam-motamed

Missing mm_projector.bin in LLaVA-Video series

2

Hi, I am trying to fine-tune LLaVA-Video-7B and LLaVA-Video-72B. However, it seems that the checkpoints for mm_projector.bin have not been released yet. Thanks for the help in advance.

kpc0810

Llava-video slowfast mode

11

Thank you for sharing the great work. I ask for some mismatch between the current codebase and arXiv technical report. 1. SlowFast mode * Is slowfast representation only used for...

HYUNJS

LLaVA-NeXT
LLaVA-NeXT copied to clipboard

Metadata

I met RuntimeError: Error(s) in loading state_dict for CLIPVisionModel

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

What is the training scripts for llava-interleave ?

Regarding the feasibility of multi-image data training and LoRA training

Some questions about the test results of LLaVA-OneVision (multi-images input)and LLaVA-Video (video input)

How to properly change generation config? Set temperature = 0.7, but output remains unchanged

Fixed broken link

slow fast features not being used in current code

Missing mm_projector.bin in LLaVA-Video series

Llava-video slowfast mode

← Metadata

Owner

Metadata

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLaVA-NeXT
LLaVA-NeXT copied to clipboard