LLaVA-NeXT issues

Consider changing llama3 configs to left padding

Thanks for making this repo! Really helpful for a project I'm working on. However, when generating in a batch, there are a couple of isses. The first is the missing...

gmongaras

Questions about 2024-05-25-llava-next-ablations

I am conducting a replication experiment on 2024-05-25-llava-next-ablations/#vision-encoders, using the scripts under the train folder in the current repository. I would like to ask, which LLM is used in this...

wswaq

多GPU场景下显存使用疑似不合理

我使用hf上的lmms-lab/LLaVA-Video-7B-Qwen2 使用样例代码，对我本地2M的视频进行处理在执行cont = model.generate 时，会报如下异常 torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.03 GiB. GPU 3 has a total capacty of 11.76 GiB of which 814.31 MiB is...

yangsen-cn

Confusion about the vision_tower parameter.

2

When training LLaVA_OneVision, why do I need to load vision_tower (`siglip`) as well as LLaVA_OneVision's own model parameters (`lmms-lab/qwen2-0.5b-si`)? Could it be that the model parameters of LLaVA_OneVision itself (`lmms-lab/qwen2-0.5b-si`)...

Davidwhw

llava-onevision-qwen2-7b-ov-hf vs llava-onevision-qwen2-7b-si-hf

1

What is the difference between these two model? llava-onevision-qwen2-7b-ov-hf vs llava-onevision-qwen2-7b-si-hf

insafim

Loss of the lm_head weights in the checkpoint of lmms-lab/llava-onevision-qwen2-0.5b-ov

3

Hello, when I tried to load and fine-tune the checkpoints of llava-onevision-0.5B, I couldn't find the weight of the head of the llm. Could it be that the weights for...

SitongGong

Modailities param in LLaVA-NeXT documentation

The llava model requires the modalities parameter to be broadcasted to the batch size, otherwise the zip statement on line 442 in llava/model/llava_arch.py reduces the batch size to 1 (the...

gmongaras

multi images inference preprocess method

Hi, I found that the default image preprocess method is only for single image input. ![image](https://github.com/user-attachments/assets/a5ea5403-a972-4cd5-97d0-9bd38f4beac1) ![image](https://github.com/user-attachments/assets/a8c0a337-31a5-4f09-ab9b-126ac2934d8a) in `process_images` func, we will use `process_anyres_image` as default preprocessor, which will cause...

zyandtom

How to get Structured Output

Is there a way to get a structured output from this VLM like we can with OpenAI: https://platform.openai.com/docs/guides/structured-outputs It can also be achievable with function calling or JSON schema, but...

paulpacaud

Missing some datasets in yaml files

2

Hi, The following list includes the [missing datasets](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data/tree/main) in onevision yaml file. Can I know where to download them? Thanks - json_path: /mnt/bn/vl-research/data/llava_instruct/real_vision_flan/llava_ofa_DEMON-FULL_filtered_311085.json - json_path: /mnt/bn/vl-research/data/llava_instruct/real_vision_flan/llava_ofa_mantis-instruct_reformatted.json - json_path: /mnt/bn/vl-research/data/llava_instruct/real_vision_flan/MathV360K_VQA-AS_5907.json -...

xumingze0308

LLaVA-NeXT
LLaVA-NeXT copied to clipboard

Metadata

Consider changing llama3 configs to left padding

Questions about 2024-05-25-llava-next-ablations

多GPU场景下显存使用疑似不合理

Confusion about the vision_tower parameter.

llava-onevision-qwen2-7b-ov-hf vs llava-onevision-qwen2-7b-si-hf

Loss of the lm_head weights in the checkpoint of lmms-lab/llava-onevision-qwen2-0.5b-ov

Modailities param in LLaVA-NeXT documentation

multi images inference preprocess method

How to get Structured Output

Missing some datasets in yaml files

← Metadata

Owner

Metadata

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLaVA-NeXT
LLaVA-NeXT copied to clipboard