Text-To-Video-Finetuning Default model seems to output only noise or greenscreen

trafficstars

After several unsuccessful attempts at fine-tuning where the output was a still frame of noise or a green field, I followed instructions and skipped to the inference to test the base model. It reacted the same way.

Am I not pointing to the model directory correctly?

!cd /content/Text-To-Video-Finetuning && python inference.py --model /content/Text-To-Video-Finetuning/models/model_scope_diffusers --prompt "cat in a space suit"

May 06 '23 13:05 patrickjonesdotca

I also tried !python /content/Text-To-Video-Finetuning/inference.py --model /content/Text-To-Video-Finetuning/models/model_scope_diffusers --prompt "cat in a space suit" and had the same output

May 06 '23 13:05 patrickjonesdotca

Hey there. After training, are you pointing to the trained model?

By default, it should be placed at the script root under ./outputs/train_<date>

May 06 '23 20:05 ExponentialML

What are you trying to view the video in? I’ve found there’s something weird about the codec sometimes and it needs to be viewed in an application like VLC

May 07 '23 02:05 dvschultz

Hey there. After training, are you pointing to the trained model?

By default, it should be placed at the script root under ./outputs/train_<date>

Yes I did try the trained model. Trained two different ones in fact. And then I thought I would do a sanity check and try to generate an image with the installed "base" model and filed this report.

Am I trying to generate an image correctly immediately after install with this line? !python /content/Text-To-Video-Finetuning/inference.py --model /content/Text-To-Video-Finetuning/models/model_scope_diffusers --prompt "cat in a space suit" because if that command is incorrect I've been on the wrong track.

May 08 '23 12:05 patrickjonesdotca

If you have lots of videos you might need to train it for longer. How many steps did you train it and how many videos? 2500 is not enough if you are doing hundreds of videos with different prompts each.

May 17 '23 17:05 polyware-ai

If you have lots of videos you might need to train it for longer. How many steps did you train it and how many videos? 2500 is not enough if you are doing hundreds of videos with different prompts each.

I was using images actually to train the model and there were about a dozen of them. I went the opposite way.

But, the problem as I see it is that one should be able to generate a clip with the inference model before running a training session. I ran into issues with that as well, hence this (possibly errant) bug report.

May 18 '23 16:05 patrickjonesdotca

Text-To-Video-Finetuning Text-To-Video-Finetuning copied to clipboard

Default model seems to output only noise or greenscreen

Text-To-Video-Finetuning
Text-To-Video-Finetuning copied to clipboard