LLaVA-NeXT issues

what prompt format shoud be if I want to input an image and a video at the same time

1

Thanks for opening source the powfer model. the case show how to generate prompt for text-image and video-image. but If I want to input image-video-text at the same time, how...

bendanzzc

Parameter selection for mm_newline_position

4

I see in the code that mm_newline_position can be selected as grid, one_token, frame, and no_token, what is the exact meaning of these parameters?

wade0604

Without a VPN, how can I install and test the environment?

1

pip install -r requirements.txt

sunyclj

analyze differences between videos with similar backgrounds but different foreground objects.

How to change the code to analyze differences between videos with similar backgrounds but different foreground objects in docs/LLaVA_OneVision_Tutorials.ipynb? ![image](https://github.com/user-attachments/assets/a1b21d5e-0fc4-4635-a8c6-dc29570d6561)

Graysonicc

About the LLaVA-OneVision 0.5B Visual tokens

I am re-evaluating the LLaVA-OneVision 0.5B on ActivityNet-QA and trying to get the value 50.5%. I get the model checkpoints using following commands: ``` warnings.filterwarnings("ignore") pretrained = "lmms-lab/llava-onevision-qwen2-0.5b-ov" model_name =...

dragonlzm

Questions about the multiview demo

Great work! But I can't find the interleave_demo.py in playground/demo/ as your doc instructs.

zkaiWu

Unable to complete final installation step.

5

It seems that decord is causing issues with completing installation. How can I rectify this? When I try to run: pip install -e ".[train]" I get the following: Obtaining file:///Users/bill/Documents/uni/2025%20Fall%20Semester%20/Research%20499/lLaVA-NEXT_project/LLaVA-NeXT...

NovaBro

How can I trian llava-next-llama-8B？

1

Are there any scripts or a link to how to train the llava-next-llama-8B model?

Mike-ihr

torch.distributed.elastic.multiprocessing.errors.ChildFailedError Error

Hi All, I have step up everything with LLaVA-Next repo. and I want to run the pretrain code file for one vision dataset however when I am running the code...

ayushgupta9198

Model perform well when using flash_attention_2 or SDPA, but it output "!!!!" when using the original attention (i.e., attn_implementation="eager")

9

I found this issue when working with the lmms-lab/llava-onevision-qwen2-7b-ov model and qwen2vl.（the transformers library is the latest version.） ### Code ```python import json import argparse from PIL import Image import...

WellDonePF

LLaVA-NeXT
LLaVA-NeXT copied to clipboard

Metadata

what prompt format shoud be if I want to input an image and a video at the same time

Parameter selection for mm_newline_position

Without a VPN, how can I install and test the environment?

analyze differences between videos with similar backgrounds but different foreground objects.

About the LLaVA-OneVision 0.5B Visual tokens

Questions about the multiview demo

Unable to complete final installation step.

How can I trian llava-next-llama-8B？

torch.distributed.elastic.multiprocessing.errors.ChildFailedError Error

Model perform well when using flash_attention_2 or SDPA, but it output "!!!!" when using the original attention (i.e., attn_implementation="eager")

← Metadata

Owner

Metadata

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLaVA-NeXT
LLaVA-NeXT copied to clipboard