LLaVA-NeXT
                                
                                
                                
                                    LLaVA-NeXT copied to clipboard
                            
                            
                            
                        Thanks for making this repo! Really helpful for a project I'm working on. However, when generating in a batch, there are a couple of isses. The first is the missing...
I am conducting a replication experiment on 2024-05-25-llava-next-ablations/#vision-encoders, using the scripts under the train folder in the current repository. I would like to ask, which LLM is used in this...
我使用hf上的lmms-lab/LLaVA-Video-7B-Qwen2 使用样例代码,对我本地2M的视频进行处理 在执行cont = model.generate 时,会报如下异常 torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.03 GiB. GPU 3 has a total capacty of 11.76 GiB of which 814.31 MiB is...
When training LLaVA_OneVision, why do I need to load vision_tower (`siglip`) as well as LLaVA_OneVision's own model parameters (`lmms-lab/qwen2-0.5b-si`)? Could it be that the model parameters of LLaVA_OneVision itself (`lmms-lab/qwen2-0.5b-si`)...
What is the difference between these two model? llava-onevision-qwen2-7b-ov-hf vs llava-onevision-qwen2-7b-si-hf
Hello, when I tried to load and fine-tune the checkpoints of llava-onevision-0.5B, I couldn't find the weight of the head of the llm. Could it be that the weights for...
The llava model requires the modalities parameter to be broadcasted to the batch size, otherwise the zip statement on line 442 in llava/model/llava_arch.py reduces the batch size to 1 (the...
Hi, I found that the default image preprocess method is only for single image input.   in `process_images` func, we will use `process_anyres_image` as default preprocessor, which will cause...
Is there a way to get a structured output from this VLM like we can with OpenAI: https://platform.openai.com/docs/guides/structured-outputs It can also be achievable with function calling or JSON schema, but...
Hi, The following list includes the [missing datasets](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data/tree/main) in onevision yaml file. Can I know where to download them? Thanks - json_path: /mnt/bn/vl-research/data/llava_instruct/real_vision_flan/llava_ofa_DEMON-FULL_filtered_311085.json - json_path: /mnt/bn/vl-research/data/llava_instruct/real_vision_flan/llava_ofa_mantis-instruct_reformatted.json - json_path: /mnt/bn/vl-research/data/llava_instruct/real_vision_flan/MathV360K_VQA-AS_5907.json -...