CogVideo
CogVideo copied to clipboard
Training details on image to video.
System Info / 系統信息
First of all, thank you for your excellent work!
Information / 问题信息
- [X] The official example scripts / 官方的示例脚本
- [ ] My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
In the paper, the description of image to video does not seem to be very detailed. I have some questions that I hope to get answers from you:
- Based on the text to video model, how much training data is needed to convert it into a text and image to video model?
- Description of the paper "To enhance the model's robustness, we add large noise to the image condition during training.", How much extra noise should be added to the image condition?
Expected behavior / 期待表现
Related issues: https://github.com/THUDM/CogVideo/issues/88