LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Results 315 LLaVA-NeXT issues
Sort by recently updated
recently updated
newest added

I am trying to use the llama3-llava-next-8b model, and I replaced --model-path with the local path of llama3-llava-next-8b that I downloaded. When I run python -m llava.serve.model_worker --host 0.0.0.0 --controller...

Hey! I want to use LLaVA-OV to do some inference, I read the paper to find the training prompt for each type of question (Table 18). But the full prompt...

I'm experiencing high memory usage in the DataLoader workers when using a custom dataset class for lazy loading large datasets. This leads to Out-of-Memory (OOM) errors during training. I've observed...

When I fine-tune using Lora, the model's convergence effect is not good. The hyperparameters are set as follows: --lora_enable True \ --deepspeed scripts/zero3.json \ --model_name_or_path ${MODEL} \ --version ${PROMPT_VERSION} \...

按照evaluation部分,目前的llava好像已经没有llava_vid, 在lmms-eval下面好像也有类似的错误 [#242 in lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval/issues/242) 想问一下有没有什么快速的解决方案。。如果自己进行适配的话该怎么操作。。

How to do the inference on multi-images? I try to input 2 images, but enconter a error ValueError: Number of image tokens in input_ids (1) different from num_images (2). Here...

Hi, thanks for your work. When I run the demo code from: https://huggingface.co/lmms-lab/LLaVA-Video-72B-Qwen2 in your LLaVA-NeXT repository, some errors happened: ``` size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape...

Hi, Thanks for the effort and amazing work! I want to download some parts of the M4 Instruct dataset, hosted [here](https://huggingface.co/datasets/lmms-lab/M4-Instruct-Data/tree/main). The following `.zip` files are available, ``` AESOP.zip ALFRED.zip...

I would like to know whether training single image version of llava-ov needs to read the training data in the order in single_image.yaml, or whether it is random.

I finetune llava-one-vision using lmms-lab/llava-onevision-qwen2-7b-ov by config --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 and have checkpoint saved, how can i using this model for eval? ![image](https://github.com/user-attachments/assets/4a962bbf-64ac-49e3-a49a-8565cb34d1b0)