LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Results 344 LLaVA-NeXT issues
Sort by recently updated
recently updated
newest added

我注意到llava-video 7B上用到的video表示方式是(64, 679, 1, 2),每个frame是679个token,但是在ov上是729个,我想问一下这其中有什么细节上的修改吗

**I encountered three problems** **The first one is that I found in the script finetune_ov.sh that in addition to the regular data_path and image_folder, there is also a video_folder. This...

不是引流,只是考虑到可能大家会有些不构成 issue 的小问题,有个群会比较好。 后续如果官方有需要,我愿意转让群管理 我的微信 dreamingforhope ,若二维码失效可添加我 ![image](https://github.com/user-attachments/assets/cf722891-99e7-479a-bbaa-cb4d41ee393a)

how can I get the data below? It seems that I can't find them in hf dataset you have provided now. llava_wild_4v_39k.json llava_wild_4v_12k.json

论文里面说是llava-ov-si 但是scipts里面是llava-ov https://github.com/LLaVA-VL/LLaVA-NeXT/blob/0070d0ae4931c9b19d9cc57c38e16a87c270a61c/scripts/video/train/SO400M_Qwen2_7B_ov_to_video_am9.sh#L29 是这里写错了吗?

环境: accelerate 0.34.2 aiohappyeyeballs 2.4.2 aiohttp 3.10.8 aiosignal 1.3.1 annotated-types 0.7.0 anyio 4.6.0 async-timeout 4.0.3 attrs 24.2.0 av 13.0.0 certifi 2024.8.30 charset-normalizer 3.3.2 click 8.1.7 datasets 2.16.1 decord 0.6.0 deepspeed...

Hi, I have used your codebase to pretrain an MLP on the "Qwen2-7B-Instruct" base model, incorporating the "openai/clip-vitlarge-patch14" encoder. The training process was smooth, with a noticeable reduction in loss....

Removed `tokenizer = copy.deepcopy(tokenizer)` from `preprocess_llama3` and `preprocess_qwen` because this operation was called every time data is fetched from the dataloader, consuming extra time. Instead, whether to use `copy.deepcopy` during...

Hi, I'm conducting some experiments with Salamandra-7b-instruct LLM (https://huggingface.co/BSC-LT/salamandra-7b-instruct) using the LLaVA-OneVision framework. However, I've noticed that `LlavaOnevisionForConditionalGeneration `currently only supports Qwen2. Is there any plan to extend support to...