LLaVA-NeXT issues

For issues and PRs, you can reach out to the corresponding paper authors directly via email~

Hi, thanks for your interest in this project! Since this repository is maintained by multiple paper authors with limited bandwidth, we may not be able to review or respond to...

Luodian

Why plain prompt version in pretraining stage?

Hi, is the prompt version in the pre-training stage of onevision (see https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/pretrain_clip.sh) set to plain on purpose? Should it not be qwen_2? If it is done on purpose could...

serwansj

size mismatch for vision_model.post_layernorm.weight

13

when i use the llavavideo to inference, load the model get these error:

zhanghang1995

The output grounding boxes have obvious deviations.

Model: lava-onevision-qwen2-0.5B-ov 输出的grounding结果看起来有非常明显的偏差，而且看训练集好像grounding的数据比例非常少？

kellenf

How to get the Question Type for each sample?

Thanks for your work! I've downloaded LLaVA-Video-178k dataset and I want to pick several **specific types of questions** for my research, according to Fig 3 in your paper. It seems...

davidluciolu

Error occurs when single and multi-image inputs are included in the same batch

1

When passing both single-image and multi-image inputs in the same batch, the following error occurs: ``` RuntimeError: Tensors must have same number of dimensions: got 2 and 1 ``` Is...

seokwon99

Error when using locally downloaded weights of google/siglip-so400m-patch14-384 from Hugging Face

1

@Luodian ### TL;DR **I'm trying to fine-tune LLaVA-Next OneVision, and when I use the local weights of `google/siglip-so400m-patch14-384`, I get the following shape mismatch error:** > `RuntimeError: size mismatch for...

08D20088

Issue with 4-bit Quantization for LLaVA-NeXT-Video-32B Model on A100-40GB GPU

1

Hello, I am trying to run the lmms-lab/LLaVA-NeXT-Video-32B-Qwen model on an A100-40GB GPU. However, I encounter an OOM issue when loading the model in its default configuration. To address this,...

Rachel0901

Meaningless response for video demo

I tried to run the `video_demo.sh` script on my own video. I only modified the video path with out changing any other parameters: ``` bash scripts/video/demo/video_demo.sh lmms-lab/LLaVA-NeXT-Video-7B-DPO vicuna_v1 32 2...

TRS07170

How can tensor parallelism be implemented in the LLaVA framework when pretraining a 7B model on A40 GPUs?

Pixel-anter

LLaVA-NeXT
LLaVA-NeXT copied to clipboard

Metadata

For issues and PRs, you can reach out to the corresponding paper authors directly via email~

Why plain prompt version in pretraining stage?

size mismatch for vision_model.post_layernorm.weight

The output grounding boxes have obvious deviations.

How to get the Question Type for each sample?

Error occurs when single and multi-image inputs are included in the same batch

Error when using locally downloaded weights of google/siglip-so400m-patch14-384 from Hugging Face

Issue with 4-bit Quantization for LLaVA-NeXT-Video-32B Model on A100-40GB GPU

Meaningless response for video demo

How can tensor parallelism be implemented in the LLaVA framework when pretraining a 7B model on A40 GPUs?

← Metadata

Owner

Metadata

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLaVA-NeXT
LLaVA-NeXT copied to clipboard