LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Results 344 LLaVA-NeXT issues
Sort by recently updated
recently updated
newest added

看原文应该是tune了vision tower的,但是在lmms-lab/llava-onevision-qwen2-7b-ov的config.json中,有`"mm_vision_tower": "google/siglip-so400m-patch14-384"`。看上去是加载了原始的vision tower。 这里有个问题,不知道是不是先加载原始的vision tower然后再进行的参数覆盖?参数覆盖的时候有warning: ``` envs/llavaov/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for vision_model.encoder.layers.21.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a...

I noticed that when I process a video frame with a standard 16:9 aspect ratio, the processed output frame isn't zero-padded, and the aspect ratio is distorted. Is this intended?...

LLaVA-OneVision used this `LLaVA-Wild (train)` dataset, but not provided in https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data. Furthermore, the paper refers to original llava paper [83] (see above figure), but this dataset does not match the...

Thanks for your great works. I'm downloading LLaVA-NeXT instruction tuning data through [lmms-lab/LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data). However, I find that there are around 779k samples in [parquet directory](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data/tree/main/data) and only 738k samples in...

Hi there! 😊 First of all, thank you so much for your amazing work on LLaVA-NeXT! I was reading about the performance improvements and how it maintains the minimalist design...

I'm running finetune_onevision.sh to finetune on my dataset and I get this error: Traceback (most recent call last): File "/home/ubuntu/LLaVA-NeXT/llava/train/train_mem.py", line 4, in train() File "/home/ubuntu/LLaVA-NeXT/llava/train/train.py", line 1672, in train...

Dear authors, Thanks for your remarkable work! I'd like to evaluate the LLaVa-OV model on different datasets. So I spotted the three bash files (eval_all.sh, eval_interleave_3d.sh and eval_multiprocess.sh) in scripts/interleave...

Does anyone know why the shape of outputs.attentions[0][-1] is [1, 754, 28, 28] 754 is the total number of token of inputs and current outputs, I wonder what's 28, 28...

Since my server environment does not seem to support Ampere GPU, I have been trying to disable Flash attention. First, I simply brought the [train_xformers.py](https://github.com/haotian-liu/LLaVA/blob/main/llava/train/train_xformers.py) and [llama_xformers_attn_monkey_patch.py](https://github.com/haotian-liu/LLaVA/blob/main/llava/train/llama_xformers_attn_monkey_patch.py) files to my...