Video-LLaMA issues

visionbranch stage2收敛问题

你好，我根据代码尝试复现stage2的效果，发现了以下现象: 0. 数据均使用代码中声明的cc_sbu_align、llava_instruct、webvid_instruct三个数据集 1. 使用repo中给出的pretrain_vicuna7b-v2.pth，可以顺利复现出正常的vicuna7b_stage2的效果。此时观察到cc_sbu_align的loss收敛到0.1附近 2. 使用repo中给出的pretrain-vicuna13b.pth，训练出的stage2模型对图片和视频的识别能力很差，经常答非所问。如果观察loss，cc_sbu_align的loss大概在0.7-0.9浮动以上两个实验除了llm和ckpt外，无任何超参区别。请问关于13b的finetune是有什么特殊的调参技巧吗？

octopusszzy

考虑加一下deepspeed吗？

1

后续有加deppspeed的计划吗，我看目前llama的参数是冻结的，试了下放开训的话a100的卡batchsize是1都跑不动

xmy0916

可以看一下最终的文件树结构吗？

有的model不确定是否下载正确

Dylandtt

FileNotFoundError: Could not find module 'C:\Users\USER\miniconda3\envs\videollama\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

komilaria

Video-LLaMA
Video-LLaMA copied to clipboard

Metadata

visionbranch stage2收敛问题

考虑加一下deepspeed吗？

可以看一下最终的文件树结构吗？

FileNotFoundError: Could not find module 'C:\Users\USER\miniconda3\envs\videollama\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

Could you please share the code to generate the instruct data?

What if no frame_position_embeddings?

llm在两个阶段都是keep frozen吗？

Dc more tweaks

Issue in api endpoints

Audio input

← Metadata

Owner

Metadata

Video-LLaMA Video-LLaMA copied to clipboard

Metadata

← Metadata

Owner

Metadata

Video-LLaMA
Video-LLaMA copied to clipboard