Finetunning with custom dataset of multiple text-video pairs.

Open Snarky36 opened this issue 1 year ago • 3 comments

Hello i would like to finetune a T2V model with some custom dataset of prompts and their video. Could you help me with some adviced of what finetunning code and what model I should use for that? I have aproximatly 1300 text video pairs of sign language. I will very much appreciate if you could help me a little bit cause I don't know where to start from and how exactly. Thank you for your time!

Apr 20 '24 22:04 Snarky36

Hi, you can train your model with: python train.py --cfg configs/t2v_train.yaml, but you should customize your own dataset format first.

Apr 21 '24 05:04 Steven-SWZhang

Thank you very much. Where can I look and understand how the dataset format should look like in order to customize mine?

Apr 21 '24 07:04 Snarky36

Please refer to the toy dataset and example config

Apr 21 '24 07:04 Steven-SWZhang