diffusers
diffusers copied to clipboard
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
# What does this PR do? Adds NewbieAI support to Diffusers. Adds `pooled_projection_dim` config to Lumina2Transformer2DModel and uses pooled projections from Newbie codebase if it is set to something other...
This PR introduces a new text-to-image pipeline named **NewbiePipeline**, as well as a new NextDiT-based transformer architecture, **NextDiT_3B_GQA_patch2_Adaln_Refiner_WHIT_CLIP**, fully implemented following Diffusers' pipeline and model design principles. ### 🚀 Main...
This PR supports LongSANA: a minute-length real-time video generation model ## Related links: project: https://nvlabs.github.io/Sana/Video code: https://github.com/NVlabs/Sana paper: https://arxiv.org/pdf/2509.24695 ## PR feature: LongSANA uses Causal Linear Attention KV Cache during...
### Describe the bug Pipelines passed to `from_pipe()` are converted to float32 unless `torch_dtype` is specified, leading to higher memory usage and slower inference. ### Reproduction ```python import torch from...
This PR is fixing #12257. Comparison with the original repo When I put `with torch.amp.autocast('cuda', dtype=torch.bfloat16):` onto the transformer only and converted the initial noise's `dtype` into `torch.float32` from `torch.bfloat16`...
# What does this PR do? Fix error in Context Parallelism doc ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the...
## What does this PR do? Fixes #12719 This PR fixes a critical issue where using bitsandbytes quantization with `device_map='balanced'` (or other device_map strategies) on transformers models within diffusers pipelines...
### Describe the bug Note: This might be something for the MVP program https://github.com/huggingface/diffusers/issues/12635 if there's anyone who already has a deep understanding of rotary embeddings and complex numbers. I...
PR: Add LTXI2VLongMultiPromptPipeline (ComfyUI-parity long I2V with multi-prompt sliding windows) What does this PR do? - Introduces a new pipeline LTXI2VLongMultiPromptPipeline providing long-duration image-to-video generation using temporal sliding windows with...
As stated [here](https://github.com/huggingface/diffusers/issues/9490#issuecomment-2369756363), lets close the scheduler gap! Problem statement: - Most (if not all quality ones) new models are DiT based - Implementation of DiT based models in Diffusers...