LLaVA-NeXT issues

Multi-node job: resume_from_checkpoint incurs error: assert len(self.ckpt_list) > 0

1

Hi, thanks for your great work. When using a single node for model training and saving intermediate checkpoints, I can use `resume_from_checkpoint` to continue training. However, when using multiple nodes...

viyjy

When will the DeepSeek model be integrated?

hunwenpinghao

Garbled output caused by config.json error after training.

2

I used the [script](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/finetune_si.sh) to fine-tune the model `llava-onevision-qwen2-0.5b-si` on `blip_laion_cc_sbu_558k.json` dataset. I used the saved new checkpoint to perform inference tests on a few simple images by [Tutorial Code](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision_Tutorials.ipynb)....

Davidwhw

Model Fails in 3-shot Setting with Long Inputs

**Description:** When running inference using this model, the model performs as expected in 0-shot (20 images + text) and 1-shot (40 images + text) settings, producing properly formatted outputs. However,...

cui0711

fintune code fail

this model could not be found in huggingface https://github.com/LLaVA-VL/LLaVA-NeXT/blob/349ebb64c5c4286cec57708689274aff07b8c74f/scripts/video/train/SO400M_Qwen2_7B_ov_to_video_am9.sh#L29

MengHao666

llava-onevision-qwen2-7b-ov does not return sensible response

4

I am trying [the tutorial](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision_Tutorials.ipynb) for the Video Input. I am using the same example (i.e the jobs.mp4) with the model weights from `lmms-lab/llava-onevision-qwen2-7b-ov` but the model inferences `The!! video!!!!!!!!!!!`...

amew0

finetune llava-onevision-qwen2-0.5b-si on custom dataset.

4

Hi, I hope to finetune llava-onevision-qwen2-0.5b-si on my own dataset. During the inference process after training the model, a warning appears stating: "Some weights of LlavaQwenForCausalLM were not initialized from...

yanbai1993

Anyres for video

Hi guys, am I right in saying that the code is not getting advantage of anyres for video? It follows that each video frame is rescaled to image encoder resolution...

GabrieleGiudic

A New Powerful Visual Tower for LLaVA!

3

We adopted the official [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) and the official training dataset [LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) for evaluating the foundational visual models. The language model is Qwen2.5-7B. | Vision Tower | RoPE2D | ChartQA |...

anxiangsir

llava1_6mix.json for pre-training

Hi, I'm going to train the LLaVA-NeXT (image ver.) myself before fine-tuning. The authors provide a script for pre-training named "./scripts/train/pretrain_clip.sh" as follows. ``` export OMP_NUM_THREADS=8 export NCCL_IB_DISABLE=0 export NCCL_IB_GID_INDEX=3...

drizzle0171

LLaVA-NeXT
LLaVA-NeXT copied to clipboard

Metadata

Multi-node job: resume_from_checkpoint incurs error: assert len(self.ckpt_list) > 0

When will the DeepSeek model be integrated?

Garbled output caused by config.json error after training.

Model Fails in 3-shot Setting with Long Inputs

fintune code fail

llava-onevision-qwen2-7b-ov does not return sensible response

finetune llava-onevision-qwen2-0.5b-si on custom dataset.

Anyres for video

A New Powerful Visual Tower for LLaVA!

llava1_6mix.json for pre-training

← Metadata

Owner

Metadata

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLaVA-NeXT
LLaVA-NeXT copied to clipboard