Text-To-Video-Finetuning icon indicating copy to clipboard operation
Text-To-Video-Finetuning copied to clipboard

Finetune ModelScope's Text To Video model using Diffusers 🧨

Results 28 Text-To-Video-Finetuning issues
Sort by recently updated
recently updated
newest added
trafficstars

Any Possible way to have the Same Nvidia implementation of using a the SD models / dreambooth models as a base for Txt2vid model? https://research.nvidia.com/labs/toronto-ai/VideoLDM/ i saw this unofficial implementation,...

Hi, Exponential-ML! As you probably know, a bit more than a week ago, Microsoft published their paper where they described the novel DiffusionOverDiffusion technique https://arxiv.org/abs/2303.12346 working by firstly outlining the...

enhancement

Do you have any knowledge of [VideoLDM](https://research.nvidia.com/labs/toronto-ai/VideoLDM/), and is it possible to integrate its algorithms to further enhance the capabilities of current models, such as generating longer videos?

enhancement

Thank you, for making this. It seems to work, and I have a model. I wanted to ask if there is: 1) a link to a repository that we can...

enhancement

After several unsuccessful attempts at fine-tuning where the output was a still frame of noise or a green field, I followed instructions and skipped to the inference to test the...

bug

[link](https://github.com/ExponentialML/Text-To-Video-Finetuning/blob/main/utils/dataset.py#L580), device = torch.device("cuda" if torch.cuda.is_available() else "cpu") cached_latent = torch.load(self.cached_data_list[index], map_location=device) Otherwise, in multi-GPU distributed training, the first GPU may occupy excessive VRAM compared to the other GPUs.

bug

while the validation output during training seems to be good. Any bugs in the inference code ? Or it is due to different diffuser version?

bug

Using existing clip checkpoint in modelscope format change the trained layers, so it will maintain integrity and not fail to load