LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Hi, thanks for your great work. When using a single node for model training and saving intermediate checkpoints, I can use `resume_from_checkpoint` to continue training. However, when using multiple nodes...
I used the [script](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/finetune_si.sh) to fine-tune the model `llava-onevision-qwen2-0.5b-si` on `blip_laion_cc_sbu_558k.json` dataset. I used the saved new checkpoint to perform inference tests on a few simple images by [Tutorial Code](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision_Tutorials.ipynb)....
**Description:** When running inference using this model, the model performs as expected in 0-shot (20 images + text) and 1-shot (40 images + text) settings, producing properly formatted outputs. However,...
this model could not be found in huggingface https://github.com/LLaVA-VL/LLaVA-NeXT/blob/349ebb64c5c4286cec57708689274aff07b8c74f/scripts/video/train/SO400M_Qwen2_7B_ov_to_video_am9.sh#L29
I am trying [the tutorial](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision_Tutorials.ipynb) for the Video Input. I am using the same example (i.e the jobs.mp4) with the model weights from `lmms-lab/llava-onevision-qwen2-7b-ov` but the model inferences `The!! video!!!!!!!!!!!`...
Hi, I hope to finetune llava-onevision-qwen2-0.5b-si on my own dataset. During the inference process after training the model, a warning appears stating: "Some weights of LlavaQwenForCausalLM were not initialized from...
Hi guys, am I right in saying that the code is not getting advantage of anyres for video? It follows that each video frame is rescaled to image encoder resolution...
We adopted the official [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) and the official training dataset [LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) for evaluating the foundational visual models. The language model is Qwen2.5-7B. | Vision Tower | RoPE2D | ChartQA |...
Hi, I'm going to train the LLaVA-NeXT (image ver.) myself before fine-tuning. The authors provide a script for pre-training named "./scripts/train/pretrain_clip.sh" as follows. ``` export OMP_NUM_THREADS=8 export NCCL_IB_DISABLE=0 export NCCL_IB_GID_INDEX=3...