LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Results 315 LLaVA-NeXT issues
Sort by recently updated
recently updated
newest added

As discussed in https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data/discussions/5, the ureader_kg and ureader_qa data are not matched with images. I was able to recover 80-90% images by matching suffixes using `id` (adding ".png" or ".jpeg"...

I encountered the following error when running the ’**finetune_onevision.sh**‘ script using the ‘**mm_projector.bin**’ file provided at [this link](https://huggingface.co/lmms-lab/llava-onevision-projectors): ``` pretrain_mm_mlp_adapter:/home/LLaVA-NeXT-PROJECT/inputModel/llava-onevision-projectors/0.5b/mm_projector.bin Traceback (most recent call last): File "/home/LLaVA-NeXT-PROJECT/LLaVA-NeXT/llava/train/train_mem.py", line 4, in...

Model name: LLaVA-NeXT-Video-7B llava/model/llava_arch.py", line 309, in prepare_inputs_labels_for_multimodal image_feature = unpad_image(image_feature, image_sizes[image_idx]) TypeError: 'NoneType' object is not subscriptable

Hi, thanks for your great work. I was wondering the how many gpus are needed to training llava-next with 72b llm.

I cloned the "lmms-lab/LLaVA-NeXT-Interleave-Bench" dataset and "llava-onevision-qwen2-7b-ov" checkpoint from Huggingface to reproduce the results of the paper, but some benchmark results seem to be very different (e.g. IEI, qbench, 3D-Chat,...

不是引流,只是考虑到可能大家会有些不构成 issue 的小问题,有个群会比较好。 后续如果官方有需要,我愿意转让群管理 我的微信 dreamingforhope ,若二维码失效可添加我 ![image](https://github.com/user-attachments/assets/4d6e7dfe-0f2c-473f-bf7d-0d99cbfe6fdc)

I'm looking for specific information about the LLaVA-NeXT-Interleave 7B model 1. Detailed parameter breakdown 1. Language Model (LLM) size 2. Image Encoder size 3. Projector size 2. VRAM requirements for...

https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/finetune_onevision.sh Is this the script for SFT? where can we find the folloing checkpoint for finetuning? `"/checkpoints/projectors/${BASE_RUN_NAME}/mm_projector.bin" \` Also, could anyone post a data preparation script for fine tuning?

# Update video code 1. From 1fps to uniformly sampled 2. add new_line logic 3. add faster token logic

I used this space https://huggingface.co/spaces/WildVision/vision-arena Video used : https://www.youtube.com/watch?v=51gdmOKs4Ek Prompt : Is the elderly person in the video safe and comfortable? Response by LLava-Next-Video-32B-Qwen: Yes, the elderly person appears to...