LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Hello, I am curious about how you constructed the temporal QA set and performed instruction tuning. Could you provide an example or introduce any related datasets?
The llava-next-video-34b DPO model is not performing well, whereas the 7B-dpo model works fine. I've reviewed related issues and tried **_changing the conv mode to mistral_direct_**, but the responses still...
when I run the following code: from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("lmms-lab/llava-next-interleave-qwen-7b") I met the error:ValueError: The checkpoint you are trying to load has model type `llava_qwen` but Transformers...
Minimal example: ``` from llava.train.train import preprocess_qwen tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-7B-Instruct") # Dummy example data conversations = [{'from': 'human', 'value': '\nProvide a brief description of the given image.'}, {'from': 'gpt', 'value':...
I am interested in fine-tuning data. I see from the description from the the other LLava rep that CLIP as part of the model architecture for 1.5. if this is...
Hi, dear author: I appreciate Llava series is a fundamental and solid work in VLM domain. And you guys continue to proposed so many version of great work, which is...
This script([scripts/train/finetune_siglip_a4.sh](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/finetune_siglip_a4.sh)) was used to train Mistal-v0.3 as a vision model. The training seemed to work and the model was output, but the model's architecture is `LlavaMistralForCausalLM`, so it cannot...
Checkout here to see the three yamls. https://github.com/LLaVA-VL/LLaVA-NeXT/tree/main/scripts/train