Text-To-Video-Finetuning issues

Add InfiNet module for DiffusionOverDiffusion training to allow for extremely (minutes!) long video creation

12

Hi, Exponential-ML! As you probably know, a bit more than a week ago, Microsoft published their paper where they described the novel DiffusionOverDiffusion technique https://arxiv.org/abs/2303.12346 working by firstly outlining the...

kabachuha

enhancement

About VideoLDM

7

Do you have any knowledge of [VideoLDM](https://research.nvidia.com/labs/toronto-ai/VideoLDM/), and is it possible to integrate its algorithms to further enhance the capabilities of current models, such as generating longer videos?

suzhenghang

enhancement

Feature request

9

Thank you, for making this. It seems to work, and I have a model. I wanted to ask if there is: 1) a link to a repository that we can...

justinwking

enhancement

Default model seems to output only noise or greenscreen

6

After several unsuccessful attempts at fine-tuning where the output was a still frame of noise or a green field, I followed instructions and skipped to the inference to test the...

patrickjonesdotca

bug

First GPU occupies more VRAM in distributed training

[link](https://github.com/ExponentialML/Text-To-Video-Finetuning/blob/main/utils/dataset.py#L580)， device = torch.device("cuda" if torch.cuda.is_available() else "cpu") cached_latent = torch.load(self.cached_data_list[index], map_location=device) Otherwise, in multi-GPU distributed training, the first GPU may occupy excessive VRAM compared to the other GPUs.

suzhenghang

bug

inference code generate only green noise

1

while the validation output during training seems to be good. Any bugs in the inference code ? Or it is due to different diffuser version?

ChrisTiger93

bug

[not tested] Fix encoder conversion by using base model

Using existing clip checkpoint in modelscope format change the trained layers, so it will maintain integrity and not fail to load

kabachuha

Text-To-Video-Finetuning
Text-To-Video-Finetuning copied to clipboard

Metadata

similar implementation to Nivida VideoLDM?

Add InfiNet module for DiffusionOverDiffusion training to allow for extremely (minutes!) long video creation

About VideoLDM

Feature request

Default model seems to output only noise or greenscreen

First GPU occupies more VRAM in distributed training

inference code generate only green noise

[not tested] Fix encoder conversion by using base model

← Metadata

Owner

Metadata

Text-To-Video-Finetuning Text-To-Video-Finetuning copied to clipboard

Metadata

← Metadata

Owner

Metadata

Text-To-Video-Finetuning
Text-To-Video-Finetuning copied to clipboard