diffusers
diffusers copied to clipboard
Request to implement FreeNoise, a new diffusion scheduler
Model/Pipeline/Scheduler description
FreeNoise is a tuning-free and time-efficient paradigm for longer video generation based on pretrained video diffusion models. In other words, it is another free lunch that can generate longer videos given multiple prompts, with negligible time cost overhead.
Open source status
- [X] The model implementation is available
- [X] The model weights are available (Only relevant if addition is not a scheduler).
Provide useful links for the implementation
- Project: http://haonanqiu.com/projects/FreeNoise.html
- Paper: https://arxiv.org/abs/2310.15169
- Code: https://github.com/arthur-qiu/LongerCrafter
Sounds like a cool idea!
cc @DN6 here
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
It can be a plug-in of AnimateDiff: https://github.com/arthur-qiu/FreeNoise-AnimateDiff
It can be a plug-in of AnimateDiff: https://github.com/arthur-qiu/FreeNoise-AnimateDiff
@DN6 Could you help to apply it to AnimateDiff in diffusers?
Hi @arthur-qiu Is FreeNoise a new type of model or scheduler? I noticed that there is a checkpoint in your repo to load it?
Hi @arthur-qiu Is FreeNoise a new type of model or scheduler? I noticed that there is a checkpoint in your repo to load it?
Hi @DN6, FreeNoise is a kind of scheduler. That is, you can add several lines of code to AnimateDiff, and then it can generate longer videos without any additional training/tuning. The checkpoint is used for the original model (can be AnimateDiff, VideoCrafter1, or other video models).
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi @arthur-qiu. Sorry for the delay here. I do think FreeNoise is a good candidate to add to AnimateDiff.
Are the major changes related to the temporal transformer https://github.com/arthur-qiu/FreeNoise-AnimateDiff/blob/e01d82233c595ce22f1a5eba487911c345ce7b5b/animatediff/models/motion_module.py#L251
And creating the latents? https://github.com/arthur-qiu/FreeNoise-AnimateDiff/blob/e01d82233c595ce22f1a5eba487911c345ce7b5b/animatediff/pipelines/pipeline_animation.py#L290
I will look into how we could integrate this.
Hi @arthur-qiu. Sorry for the delay here. I do think FreeNoise is a good candidate to add to AnimateDiff.
Are the major changes related to the temporal transformer https://github.com/arthur-qiu/FreeNoise-AnimateDiff/blob/e01d82233c595ce22f1a5eba487911c345ce7b5b/animatediff/models/motion_module.py#L251
And creating the latents? https://github.com/arthur-qiu/FreeNoise-AnimateDiff/blob/e01d82233c595ce22f1a5eba487911c345ce7b5b/animatediff/pipelines/pipeline_animation.py#L290
I will look into how we could integrate this.
@DN6 Yes. I mainly edit these two (local window on temporal transformer + creating the latents). As a reference, ComfyUI smoothly integrates FreeNoise as an option to AnimateDiff (https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved/issues/118) and users have obtained some nice results (https://twitter.com/toyxyz3/status/1751256029961335218?s=20).
Hi @arthur-qiu I had a question about the memory requirements for FreeNoise. I see that the local window is applied in the temporal transformer, but the remaining UNet layers would need to process a long latent sequence? e.g. If passing in a latent to generate 512 frames, the resnet and 2D attention layers would have to deal with this entire sequence at once right?
Producing a long sequence on a GPU with 16GB VRAM is likely not possible with the current implementation?
And Motion Injection is not supported by the current implemention either? At least not in the AnimateDiff codebase? https://github.com/arthur-qiu/FreeNoise-AnimateDiff
Hi @arthur-qiu I had a question about the memory requirements for FreeNoise. I see that the local window is applied in the temporal transformer, but the remaining UNet layers would need to process a long latent sequence? e.g. If passing in a latent to generate 512 frames, the resnet and 2D attention layers would have to deal with this entire sequence at once right?
Producing a long sequence on a GPU with 16GB VRAM is likely not possible with the current implementation?
And Motion Injection is not supported by the current implemention either? At least not in the AnimateDiff codebase? https://github.com/arthur-qiu/FreeNoise-AnimateDiff
Yes. Need to generate all frames together. The memory requirements are similar to directly generating the video without FreeNoise.
Motion Injection is only implemented on VideoCrafter, not support AnimateDiff currently.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.