Text-To-Video-Finetuning
Text-To-Video-Finetuning copied to clipboard
Finetune ModelScope's Text To Video model using Diffusers 🧨
Any Possible way to have the Same Nvidia implementation of using a the SD models / dreambooth models as a base for Txt2vid model? https://research.nvidia.com/labs/toronto-ai/VideoLDM/ i saw this unofficial implementation,...
Hi, Exponential-ML! As you probably know, a bit more than a week ago, Microsoft published their paper where they described the novel DiffusionOverDiffusion technique https://arxiv.org/abs/2303.12346 working by firstly outlining the...
Do you have any knowledge of [VideoLDM](https://research.nvidia.com/labs/toronto-ai/VideoLDM/), and is it possible to integrate its algorithms to further enhance the capabilities of current models, such as generating longer videos?
Thank you, for making this. It seems to work, and I have a model. I wanted to ask if there is: 1) a link to a repository that we can...
After several unsuccessful attempts at fine-tuning where the output was a still frame of noise or a green field, I followed instructions and skipped to the inference to test the...
[link](https://github.com/ExponentialML/Text-To-Video-Finetuning/blob/main/utils/dataset.py#L580), device = torch.device("cuda" if torch.cuda.is_available() else "cpu") cached_latent = torch.load(self.cached_data_list[index], map_location=device) Otherwise, in multi-GPU distributed training, the first GPU may occupy excessive VRAM compared to the other GPUs.
while the validation output during training seems to be good. Any bugs in the inference code ? Or it is due to different diffuser version?
Using existing clip checkpoint in modelscope format change the trained layers, so it will maintain integrity and not fail to load