LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Results 315 LLaVA-NeXT issues
Sort by recently updated
recently updated
newest added

I finetuned "llava-onevision-qwen2-7b-si" using my custom data. After the fine-tuning process, I tried to inference using my finetuned model. But, when I use the load_pretrained_model function, I met an error...

I redownload this repo,and tried `transfoemers` version:`4.40.0.dev`、`4.40.0`、`4.41.2`,the result is still `['']`. some thing i do include: All weight i use is local weight.below is my change. 1. `Meta-Llama-3-8B-Instruct`:llava/conversation.py,line387, tokenizer=AutoTokenizer.from_pretrained("local_path/LLaVA-NeXT/Meta-Llama-3-8B-Instruct") 2....

Hi! Thanks for your great work. Could you please release the following evaluation codes? model_video_chatgpt_general.py eval_activitynet_qa.py model_video_detail_description.py

Getting ValueError: Unknown vision tower: google/siglip-so400m-patch14-384 on running https://github.com/LLaVA-VL/LLaVA-NeXT/blob/5fbcf27e32935f4e09d6b8b9f8abed4a572240b0/docs/LLaVA_OneVision_Tutorials.ipynb I did these steps before running the code : https://github.com/haotian-liu/LLaVA/issues/1101#issuecomment-1933697654

run demo in _docs/LLaVA_OneVision_Tutorials.ipynb_, when using video input, error occur as _"AttributeError: 'LlavaQwenConfig' object has no attribute 'mm_newline_position'"_

Thank you for your contribution. Under the huggingface `lmms-lab/LLaVA-OneVision-Data` repo, I find that there are only single-image data, and in your `scripts/train/README.md`, you say that the video incorporates **Youcook2 (32267),...

After finetuning and saving checkpoint i could not load model ``` pretrained = "/workspace/checkpoints/checkpoint-1006" model_name = "llava_qwen" device = "cuda" device_map = "auto" tokenizer, model, image_processor, max_length = load_pretrained_model(pretrained, None,...

Hi, just wanted to share this conversion script as part as a PR to integrate LLava-onevision into the transformers package: https://github.com/zucchini-nlp/transformers/blob/llava-onevision/src/transformers/models/llava_onevision/convert_llava_onevision_weights_to_hf.py It works well for the original llava-onevision checkpoints, and...

Currently running the demo notebook for LLaVA OneVision for video modality doesn't apply pooling for all video patches/frames, because the `modality` list holds values for each prompt, while videos can...