LLaVA-NeXT issues

Running the eval example script for Llava-next-video reports an error

3

Traceback (most recent call last): File "/home/ubuntu/taoyu/LLaVA-NeXT/llava/eval/model_video_detail_description.py", line 197, in run_inference(args) File "/home/ubuntu/taoyu/LLaVA-NeXT/llava/eval/model_video_detail_description.py", line 175, in run_inference model.update_prompt([[cur_prompt]]) File "/home/ubuntu/miniconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute...

yutao1024

dpo_ov7b.sh imports data_processing which is missing

4

In train_dpo.py line 41, it imports data_processing which is removed in the latest commit. from data_processing.utils import load_jsonl, load_json ModuleNotFoundError: No module named 'data_processing'

bluesky333

3 pytorch allocator cache flushes since last step

consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time Cache cleared {'loss': 1.0039, 'grad_norm': 4.742300987243652, 'learning_rate': 2.7891156462585034e-06, 'epoch': 0.01}...

mylesgoose

想学习【MLLM多模态】快加学习交流群

![多模态](https://github.com/user-attachments/assets/d39f54f3-7eda-4fc7-b63f-48dbb513f49c) 加wx: yzyykm666，备注：MLLM ，邀请你加群哈！！！

km1994

Why is LLaVA-OV interleaved inference and video inference configuration different?

Hello, thanks for contributing a very exciting model! I noticed that the interleaved and video inference examples given in the notebooks are set up with different configs as the model...

chancharikmitra

How to obtain the data about 'instruct_azure_dc_zh_92K.json' ?

TengfeiSong000

Any reason for also training vision encoder?

It seems that the policy for training llava has changed since llava-next. While before it was the tradition to only finetune the connector and the LLM during instruction tuning, now...

NicoZenith

What does this line code mean?

https://github.com/LLaVA-VL/LLaVA-NeXT/blob/56cdba265cc786454115f98e5da967a99b532263/llava/model/llava_arch.py#L449 I quite not understand, isn't just ``` if num_images == 0: cur_image_features = image_features[cur_image_idx] cur_input_embeds_1 = self.get_model().embed_tokens(cur_input_ids) cur_input_embeds = cur_input_embeds_1 new_input_embeds.append(cur_input_embeds) new_labels.append(labels[batch_idx]) ``` ?

lucasjinreal

tqa provided in llava-onevision-data is iconqa?

I see the tqa subset in https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data is 27.3k rows, the number is the same as iconqa, but the number of tqa given in paper is TQA (1.4 K). I...

baiyuting

KeyError: 'llava' when evaluating LLaVA-OneVision

https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision.md#evaluating-llava-onevision-on-multiple-datasets I followed the instruction there to evaluate LLaVA-OneVision finetuned with my dataset. This is my command, and I used 'include_path ' argument for my finetuned model. > accelerate launch...

Bleking

LLaVA-NeXT
LLaVA-NeXT copied to clipboard

Metadata

Running the eval example script for Llava-next-video reports an error

dpo_ov7b.sh imports data_processing which is missing

3 pytorch allocator cache flushes since last step

想学习【MLLM多模态】快加学习交流群

Why is LLaVA-OV interleaved inference and video inference configuration different?

How to obtain the data about 'instruct_azure_dc_zh_92K.json' ?

Any reason for also training vision encoder?

What does this line code mean?

tqa provided in llava-onevision-data is iconqa?

KeyError: 'llava' when evaluating LLaVA-OneVision

← Metadata

Owner

Metadata

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLaVA-NeXT
LLaVA-NeXT copied to clipboard