LLaVA-NeXT issues

About LLaMA-3-LLaVA-NeXT-8B: The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

1

I am trying to use the llama3-llava-next-8b model, and I replaced --model-path with the local path of llama3-llava-next-8b that I downloaded. When I run python -m llava.serve.model_worker --host 0.0.0.0 --controller...

wangtong627

Optimal prompt format for reasoning on various questions.

1

Hey! I want to use LLaVA-OV to do some inference, I read the paper to find the training prompt for each type of question (Table 18). But the full prompt...

xin-ran-w

High Memory Usage in DataLoader Workers Leading to Out-of-Memory (OOM)

I'm experiencing high memory usage in the DataLoader workers when using a custom dataset class for lazy loading large datasets. This leads to Out-of-Memory (OOM) errors during training. I've observed...

inigopm

llava-onevision-qwen2-7b-ov finetune based on LoRA

2

When I fine-tune using Lora, the model's convergence effect is not good. The hyperparameters are set as follows: --lora_enable True \ --deepspeed scripts/zero3.json \ --model_name_or_path ${MODEL} \ --version ${PROMPT_VERSION} \...

yimuu

llava-video使用llms-eval测试出错

5

按照evaluation部分，目前的llava好像已经没有llava_vid，在lmms-eval下面好像也有类似的错误 [#242 in lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval/issues/242) 想问一下有没有什么快速的解决方案。。如果自己进行适配的话该怎么操作。。

yuanrr

Could you please help me to do the inference on multi-images?

4

How to do the inference on multi-images? I try to input 2 images, but enconter a error ValueError: Number of image tokens in input_ids (1) different from num_images (2). Here...

tiuxuxsh76075

Weights mismatch in vision tower.

7

Hi, thanks for your work. When I run the demo code from: https://huggingface.co/lmms-lab/LLaVA-Video-72B-Qwen2 in your LLaVA-NeXT repository, some errors happened: ``` size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape...

Leon1207

M4 Instruct Dataset Corresponding ZIP Files

Hi, Thanks for the effort and amazing work! I want to download some parts of the M4 Instruct dataset, hosted [here](https://huggingface.co/datasets/lmms-lab/M4-Instruct-Data/tree/main). The following `.zip` files are available, ``` AESOP.zip ALFRED.zip...

varungupta31

Training dataset in single image llava onevision model

I would like to know whether training single image version of llava-ov needs to read the training data in the order in single_image.yaml, or whether it is random.

ZhaoyangLi-nju

how to merge lora finetuned model with base model

24

I finetune llava-one-vision using lmms-lab/llava-onevision-qwen2-7b-ov by config --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 and have checkpoint saved, how can i using this model for eval? ![image](https://github.com/user-attachments/assets/4a962bbf-64ac-49e3-a49a-8565cb34d1b0)

jiepan874

LLaVA-NeXT
LLaVA-NeXT copied to clipboard

Metadata

About LLaMA-3-LLaVA-NeXT-8B: The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

Optimal prompt format for reasoning on various questions.

High Memory Usage in DataLoader Workers Leading to Out-of-Memory (OOM)

llava-onevision-qwen2-7b-ov finetune based on LoRA

llava-video使用llms-eval测试出错

Could you please help me to do the inference on multi-images?

Weights mismatch in vision tower.

M4 Instruct Dataset Corresponding ZIP Files

Training dataset in single image llava onevision model

how to merge lora finetuned model with base model

← Metadata

Owner

Metadata

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLaVA-NeXT
LLaVA-NeXT copied to clipboard