CogVideo
CogVideo copied to clipboard
2B model finetune results
System Info / 系統信息
Information / 问题信息
- [X] The official example scripts / 官方的示例脚本
- [ ] My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
- datasets: https://huggingface.co/datasets/Wild-Heart/Disney-VideoGeneration-Dataset ; I used the official Disney dataset and divided the data into a video corresponding to a label format according to SAT's README requirements.
- The following documents have been modified as required: configs/sft.yaml && configs/cogvideox_2b_lora.yaml
- I enabled fine-tuning training using a script for single card fine-tuning
- After 1000 steps, I obtained my fine tuned model and followed the official instructions to test its effectiveness using the modified model.
5.Here is my prompt and the video I received: prompt: (1)A classic cartoon black and white scene, the protagonist is a little girl walking on the street, holding a bag of fruits in her hand. A little dog walked towards me, wagging its tail and slowly approaching the little girl. The little girl stopped in her tracks, and the little dog raised its head and stopped for a moment, happily sniffing the fruit in the bag. (2)- The story unfolds against a black and white background. The scene is a fitness corner in the park, where there are many fitness equipment. An old man wearing a vest and pants is exercising, and in the corner is his little dog sleeping on its stomach. results: (1)
https://github.com/user-attachments/assets/ae9e45c5-39c4-4d2d-8ce7-b9b76641dd2b
(2)
https://github.com/user-attachments/assets/871c2133-8f72-4f0f-9a8c-9a37e9345068
Expected behavior / 期待表现
I did not achieve the desired fine-tuning effect, and the quality of the entire video has significantly decreased. I hope someone can remind me if I have done any wrong steps? How can I improve to achieve good results? Thank you for your help. Wishing you all the best