Text-To-Video-Finetuning icon indicating copy to clipboard operation
Text-To-Video-Finetuning copied to clipboard

Feature request

Open justinwking opened this issue 2 years ago • 9 comments
trafficstars

Thank you, for making this. It seems to work, and I have a model.

I wanted to ask if there is:

  1. a link to a repository that we can use to generate videos with our new diffusion models, or a small example on how to do it with python or something like that.
  2. a way to specify the frame rate of the sample videos. Everything seems to sample at 6-8 fps, so the default 24fps videos seem too fast to really see what the sample video looks like.
  3. if we use a json file, do we also need to specify the video folder, or does the json's hyperlinks take care of that?

Thank you!

justinwking avatar Apr 24 '23 10:04 justinwking

Hi! As for the first point, there's a webui plugin for Auto1111 https://github.com/deforum-art/sd-webui-text2video with a GUI where you can specify anything for your generation. To convert your finetuned models to use in that GUI, use the script in this repo https://github.com/ExponentialML/Text-To-Video-Finetuning/blob/main/utils/convert_diffusers_to_original_ms_text_to_video.py

kabachuha avatar Apr 24 '23 11:04 kabachuha

Thank you kabachuha, for the convert_diffusers_to_original_ms_text_to_video.py, what Arguments do I need to put in? Should I put the root folder of the model, or link directly to the bin files for the Unet and text encoder, and do I need to specify an output folder? Thank you!

justinwking avatar Apr 24 '23 11:04 justinwking

python convert_diffusers_to_original_ms_text_to_video.py --model_path path-to-your-diffusers-model-folder --checkpoint_path text2video_pytorch_model.pth --clip_checkpoint_path clip.ckpt. Don't use this clip.ckpt, it's not converted well at the moment, and I need to remove it from requirements

kabachuha avatar Apr 24 '23 11:04 kabachuha

So should I put in the clip checkpoint path and just not used the clip file that is created, or should I leave the clip checkpoint path blank?

justinwking avatar Apr 24 '23 17:04 justinwking

@justinwking use this branch for now, before it's merged https://github.com/kabachuha/Text-To-Video-Finetuning/tree/patch-1

kabachuha avatar Apr 24 '23 18:04 kabachuha

Sorry to ask such basic questions..... but I couldn't find the files you suggested I include. So I am guessing they have a different name, at the bottom of this post, I created an interpretation of what I think you meant, please correct me if I am mistaken. If this is my folder structure....

Text to video Fine Tuning

- [ ] Models
   - Model_scope_diffusers
        - Scheduler
        - Text_encoder
        - Tokenizer
        - Unet 
        - Vae 
``- [ ] Outputs
    - Train 2023….
        - Cached Latents
        - CHECKPOINT 2500
        - Checkpoint 5000
            - Scheduler
            - Text-encoder
            - Tokenizer
            - Unet
            - Vae
        - Lora
        - Samples

Does the following command look correct if I do everything from the text_to_finetuning folder....

python .Utils/convert_diffusers_to_original_ms_text_to_video.py --model_path models/model_scope_diffusers/ --checkpoint_path outputs/Train2003…/Lora/5000_unet.pt --clip_checkpoint_path outputs/Train2003…/Lora/5000_text_encoder.pt

justinwking avatar Apr 24 '23 18:04 justinwking

Use this folder as models_path "./Outputs/Train 2023…./Checkpoint 5000"

kabachuha avatar Apr 24 '23 19:04 kabachuha

Good morning, I believe I was able to get the script to work with your instructions, but I didn't see a new folder created. What do I need to do to get this into a format and location that t2v can use? All the file names are different, and the folder structures is different. Is this something that the script could do?

justinwking avatar Apr 25 '23 11:04 justinwking

I haven't been able to find a readme that explains the process, maybe there is one that I overlooked.

the following was generated when I did the training

Configuration saved in ./outputs\train_2023-04-24T00-05-34\vae\config.json Model weights saved in ./outputs\train_2023-04-24T00-05-34\vae\diffusion_pytorch_model.bin Configuration saved in ./outputs\train_2023-04-24T00-05-34\unet\config.json Model weights saved in ./outputs\train_2023-04-24T00-05-34\unet\diffusion_pytorch_model.bin Configuration saved in ./outputs\train_2023-04-24T00-05-34\scheduler\scheduler_config.json Configuration saved in ./outputs\train_2023-04-24T00-05-34\model_index.json 04/24/2023 06:13:39 - INFO - main - Saved model at ./outputs\train_2023-04-24T00-05-34 on step 10000

then I put in the command,

(text2video-finetune) python ./Utils/convert_diffusers_to_original_ms_text_to_video.py --model_path "./Outputs/train_2023-04-24T00-05-34/Checkpoint-10000"--checkpoint_path "./Outputs/train_2023-04-24T00-05-34/Lora/10000_unet.pt" --clip_checkpoint_path "./Outputs/train_2023-04-24T00-05-34/Lora/10000_text_encoder.pt"

and the process worked, but I don't know where the new UNET is...

Saving UNET Operation successfull

But now.... I don't see anything that looks like the modelscope folder that I am currently using in Automatic1111

configuration.json
open_clip_pytorch_model.bin
README.md
text2video_pytorch_model.pth
VQGAN_autoencoder.pth

justinwking avatar Apr 25 '23 17:04 justinwking