Text-To-Video-Finetuning
Text-To-Video-Finetuning copied to clipboard
Default model seems to output only noise or greenscreen
After several unsuccessful attempts at fine-tuning where the output was a still frame of noise or a green field, I followed instructions and skipped to the inference to test the base model. It reacted the same way.
Am I not pointing to the model directory correctly?
!cd /content/Text-To-Video-Finetuning && python inference.py --model /content/Text-To-Video-Finetuning/models/model_scope_diffusers --prompt "cat in a space suit"
I also tried !python /content/Text-To-Video-Finetuning/inference.py --model /content/Text-To-Video-Finetuning/models/model_scope_diffusers --prompt "cat in a space suit" and had the same output
Hey there. After training, are you pointing to the trained model?
By default, it should be placed at the script root under ./outputs/train_<date>
What are you trying to view the video in? I’ve found there’s something weird about the codec sometimes and it needs to be viewed in an application like VLC
Hey there. After training, are you pointing to the trained model?
By default, it should be placed at the script root under ./outputs/train_<date>
Yes I did try the trained model. Trained two different ones in fact. And then I thought I would do a sanity check and try to generate an image with the installed "base" model and filed this report.
Am I trying to generate an image correctly immediately after install with this line? !python /content/Text-To-Video-Finetuning/inference.py --model /content/Text-To-Video-Finetuning/models/model_scope_diffusers --prompt "cat in a space suit" because if that command is incorrect I've been on the wrong track.
If you have lots of videos you might need to train it for longer. How many steps did you train it and how many videos? 2500 is not enough if you are doing hundreds of videos with different prompts each.
If you have lots of videos you might need to train it for longer. How many steps did you train it and how many videos? 2500 is not enough if you are doing hundreds of videos with different prompts each.
I was using images actually to train the model and there were about a dozen of them. I went the opposite way.
But, the problem as I see it is that one should be able to generate a clip with the inference model before running a training session. I ran into issues with that as well, hence this (possibly errant) bug report.