LLaVA-NeXT issues

Weight shape mismatch

6

I finetuned "llava-onevision-qwen2-7b-si" using my custom data. After the fine-tuning process, I tried to inference using my finetuned model. But, when I use the load_pretrained_model function, I met an error...

byungokhan

0.5b model work fine,7b model result is `['']`

1

I redownload this repo,and tried `transfoemers` version:`4.40.0.dev`、`4.40.0`、`4.41.2`,the result is still `['']`. some thing i do include: All weight i use is local weight.below is my change. 1. `Meta-Llama-3-8B-Instruct`:llava/conversation.py,line387, tokenizer=AutoTokenizer.from_pretrained("local_path/LLaVA-NeXT/Meta-Llama-3-8B-Instruct") 2....

rookiez7

Evaluation code is missing.

2

Hi! Thanks for your great work. Could you please release the following evaluation codes？ model_video_chatgpt_general.py eval_activitynet_qa.py model_video_detail_description.py

Leon1207

ValueError: Unknown vision tower: google/siglip-so400m-patch14-384

3

Getting ValueError: Unknown vision tower: google/siglip-so400m-patch14-384 on running https://github.com/LLaVA-VL/LLaVA-NeXT/blob/5fbcf27e32935f4e09d6b8b9f8abed4a572240b0/docs/LLaVA_OneVision_Tutorials.ipynb I did these steps before running the code : https://github.com/haotian-liu/LLaVA/issues/1101#issuecomment-1933697654

ghost

AttributeError: 'LlavaQwenConfig' object has no attribute 'mm_newline_position'

3

run demo in _docs/LLaVA_OneVision_Tutorials.ipynb_, when using video input, error occur as _"AttributeError: 'LlavaQwenConfig' object has no attribute 'mm_newline_position'"_

yhyang123

How to access video data in LLaVA-OneVision?

2

Thank you for your contribution. Under the huggingface `lmms-lab/LLaVA-OneVision-Data` repo, I find that there are only single-image data, and in your `scripts/train/README.md`, you say that the video incorporates **Youcook2 (32267),...

xsgldhy

Load checkpoint after finetuning

After finetuning and saving checkpoint i could not load model ``` pretrained = "/workspace/checkpoints/checkpoint-1006" model_name = "llava_qwen" device = "cuda" device_map = "auto" tokenizer, model, image_processor, max_length = load_pretrained_model(pretrained, None,...

huynhbaobk

Conversion of checkpoints to hf format

5

Hi, just wanted to share this conversion script as part as a PR to integrate LLava-onevision into the transformers package: https://github.com/zucchini-nlp/transformers/blob/llava-onevision/src/transformers/models/llava_onevision/convert_llava_onevision_weights_to_hf.py It works well for the original llava-onevision checkpoints, and...

NicoZenith

What does the `trl` folder do?

leexinhao

Fix: videos in LLaVa-OV

Currently running the demo notebook for LLaVA OneVision for video modality doesn't apply pooling for all video patches/frames, because the `modality` list holds values for each prompt, while videos can...

zucchini-nlp

LLaVA-NeXT
LLaVA-NeXT copied to clipboard

Metadata

Weight shape mismatch

0.5b model work fine,7b model result is `['']`

Evaluation code is missing.

ValueError: Unknown vision tower: google/siglip-so400m-patch14-384

AttributeError: 'LlavaQwenConfig' object has no attribute 'mm_newline_position'

How to access video data in LLaVA-OneVision?

Load checkpoint after finetuning

Conversion of checkpoints to hf format

What does the `trl` folder do?

Fix: videos in LLaVa-OV

← Metadata

Owner

Metadata

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLaVA-NeXT
LLaVA-NeXT copied to clipboard