LLaVA-NeXT issues

question about Temporal QA Set and Instruction Tuning

Hello, I am curious about how you constructed the temporal QA set and performed instruction tuning. Could you provide an example or introduce any related datasets?

YoungjaeDev

performance problem of llava-next-video-34b DPO model

2

The llava-next-video-34b DPO model is not performing well, whereas the 7B-dpo model works fine. I've reviewed related issues and tried **_changing the conv mode to mistral_direct_**, but the responses still...

YoungjaeDev

ValueError: The checkpoint you are trying to load has model type `llava_qwen` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

when I run the following code: from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("lmms-lab/llava-next-interleave-qwen-7b") I met the error:ValueError: The checkpoint you are trying to load has model type `llava_qwen` but Transformers...

eternal8080

preprocess_qwen does not mask labels for <|im_start|> <|im_end|>

Minimal example: ``` from llava.train.train import preprocess_qwen tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-7B-Instruct") # Dummy example data conversations = [{'from': 'human', 'value': '\nProvide a brief description of the given image.'}, {'from': 'gpt', 'value':...

halflearned

[Question] A series of questions about fine tuning , I want to learn this stuff

I am interested in fine-tuning data. I see from the description from the the other LLava rep that CLIP as part of the model architecture for 1.5. if this is...

bdv29

too many version coexists in this repo, messy

Hi, dear author: I appreciate Llava series is a fundamental and solid work in VLM domain. And you guys continue to proposed so many version of great work, which is...

dragen1860

[Question] Is it possible to convert `LlavaMistralForCausalLM` to `LlavaNextForConditionalGeneration`?

This script([scripts/train/finetune_siglip_a4.sh](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/finetune_siglip_a4.sh)) was used to train Mistal-v0.3 as a vision model. The training seemed to work and the model was output, but the model's architecture is `LlavaMistralForCausalLM`, so it cannot...

kouyakamada

[Common Issues] Releasing LLaVA-OneVision data yaml files in three stages (mid/single-image/onevision)

2

Checkout here to see the three yamls. https://github.com/LLaVA-VL/LLaVA-NeXT/tree/main/scripts/train

Luodian

documentation

LLaVA-NeXT
LLaVA-NeXT copied to clipboard

Metadata

question about Temporal QA Set and Instruction Tuning

performance problem of llava-next-video-34b DPO model

ValueError: The checkpoint you are trying to load has model type `llava_qwen` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

preprocess_qwen does not mask labels for <|im_start|> <|im_end|>

[Question] A series of questions about fine tuning , I want to learn this stuff

too many version coexists in this repo, messy

[Question] Is it possible to convert `LlavaMistralForCausalLM` to `LlavaNextForConditionalGeneration`?

[Common Issues] Releasing LLaVA-OneVision data yaml files in three stages (mid/single-image/onevision)

← Metadata

Owner

Metadata

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLaVA-NeXT
LLaVA-NeXT copied to clipboard