LLaVA-NeXT issues

UReader data (kg/qa) in llava-onevision-data does not match with images

2

As discussed in https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data/discussions/5, the ureader_kg and ureader_qa data are not matched with images. I was able to recover 80-90% images by matching suffixes using `id` (adding ".png" or ".jpeg"...

khanrc

Size Mismatch Issue in ‘mm_projector.bin’ for ‘llava-onevision-qwen2-0.5b-ov’ Model

6

I encountered the following error when running the ’**finetune_onevision.sh**‘ script using the ‘**mm_projector.bin**’ file provided at [this link](https://huggingface.co/lmms-lab/llava-onevision-projectors): ``` pretrain_mm_mlp_adapter:/home/LLaVA-NeXT-PROJECT/inputModel/llava-onevision-projectors/0.5b/mm_projector.bin Traceback (most recent call last): File "/home/LLaVA-NeXT-PROJECT/LLaVA-NeXT/llava/train/train_mem.py", line 4, in...

fancy12335

Bugs in the inference phase

Model name: LLaVA-NeXT-Video-7B llava/model/llava_arch.py", line 309, in prepare_inputs_labels_for_multimodal image_feature = unpad_image(image_feature, image_sizes[image_idx]) TypeError: 'NoneType' object is not subscriptable

josephzpng

resources required for training with 72b language models

3

Hi, thanks for your great work. I was wondering the how many gpus are needed to training llava-next with 72b llm.

annopackage

Failure to reproduce the paper results

1

I cloned the "lmms-lab/LLaVA-NeXT-Interleave-Bench" dataset and "llava-onevision-qwen2-7b-ov" checkpoint from Huggingface to reproduce the results of the paper, but some benchmark results seem to be very different (e.g. IEI, qbench, 3D-Chat,...

yuan-QAQ

(Community Chatting Group)建一个微信交流群，这样大家有问题可以实时讨论

1

不是引流，只是考虑到可能大家会有些不构成 issue 的小问题，有个群会比较好。后续如果官方有需要，我愿意转让群管理我的微信 dreamingforhope ，若二维码失效可添加我 ![image](https://github.com/user-attachments/assets/4d6e7dfe-0f2c-473f-bf7d-0d99cbfe6fdc)

chmod777john

Request for Detailed Parameter Breakdown of LLaVA-NeXT-Interleave 7B Model and VRAM Requirements

I'm looking for specific information about the LLaVA-NeXT-Interleave 7B model 1. Detailed parameter breakdown 1. Language Model (LLM) size 2. Image Encoder size 3. Projector size 2. VRAM requirements for...

YoungjaeDev

checkpoint for finetuning / datafile for finetuning

3

https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/finetune_onevision.sh Is this the script for SFT? where can we find the folloing checkpoint for finetuning? `"/checkpoints/projectors/${BASE_RUN_NAME}/mm_projector.bin" \` Also, could anyone post a data preparation script for fine tuning?

YerongLi

update video code

2

# Update video code 1. From 1fps to uniformly sampled 2. add new_line logic 3. add faster token logic

ZhangYuanhan-AI

LLaVA-NeXT-Video-32B-Qwen gives inaccurate video analysis.

I used this space https://huggingface.co/spaces/WildVision/vision-arena Video used : https://www.youtube.com/watch?v=51gdmOKs4Ek Prompt : Is the elderly person in the video safe and comfortable? Response by LLava-Next-Video-32B-Qwen: Yes, the elderly person appears to...

taxom-techlead

LLaVA-NeXT
LLaVA-NeXT copied to clipboard

Metadata

UReader data (kg/qa) in llava-onevision-data does not match with images

Size Mismatch Issue in ‘mm_projector.bin’ for ‘llava-onevision-qwen2-0.5b-ov’ Model

Bugs in the inference phase

resources required for training with 72b language models

Failure to reproduce the paper results

(Community Chatting Group)建一个微信交流群，这样大家有问题可以实时讨论

Request for Detailed Parameter Breakdown of LLaVA-NeXT-Interleave 7B Model and VRAM Requirements

checkpoint for finetuning / datafile for finetuning

update video code

LLaVA-NeXT-Video-32B-Qwen gives inaccurate video analysis.

← Metadata

Owner

Metadata

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLaVA-NeXT
LLaVA-NeXT copied to clipboard