Add NewbiePipeline and NextDiT_3B_GQA_patch2_Adaln_Refiner_WHIT_CLIP transformer
This PR introduces a new text-to-image pipeline named NewbiePipeline, as well as a new NextDiT-based transformer architecture, NextDiT_3B_GQA_patch2_Adaln_Refiner_WHIT_CLIP, fully implemented following Diffusers' pipeline and model design principles.
🚀 Main additions
• New pipeline
Adds NewbiePipeline under diffusers.pipelines.newbie/.
The pipeline follows the standard Diffusers structure (DiffusionPipeline subclass) and
supports loading via from_pretrained.
• New transformer architecture
Adds transformer_newbie.py, implementing:
- NextDiT backbone with grouped-query attention (GQA)
- Adaln-Refiner blocks
- Patch-size 2 vision encoder
- 36 transformer layers
- 2304 hidden dims
- WHIT CLIP–style text conditioning
The transformer inherits from ModelMixin, enabling standard save/load, weight
serialization and integration with Diffusers utilities.
• RMSNorm implementation
Adds RMSNorm to diffusers.models.components, using a PyTorch fallback and supporting
Apex fused RMSNorm if available.
• Scheduler compatibility
The pipeline is compatible with FlowMatchEulerDiscreteScheduler without requiring
additional custom scheduler code.
🧩 Motivation
This PR provides an implementation of a modern NextDiT-style text-to-image architecture
with high-resolution capability and strong conditioning support.
The goal is to enable researchers and users to load, run, and fine-tune this model
directly through Diffusers with minimal friction.
📁 Files added
src/diffusers/models/components.py src/diffusers/models/transformers/transformer_newbie.py src/diffusers/pipelines/newbie/pipeline_newbie.py src/diffusers/pipelines/newbie/init.py
shell Copy code
📁 Files modified
src/diffusers/init.py src/diffusers/models/init.py src/diffusers/models/transformers/init.py src/diffusers/pipelines/init.py
yaml Copy code
✔ Notes
- No external dependencies required
- Apex is optional; PyTorch RMSNorm is the default path
- The pipeline has been tested locally with
from_pretrainedand produces expected outputs - Follows the established structure of Diffusers pipelines & transformer modules
Fixes # (no issue linked)
Before submitting
- [x] I have read the contributor guidelines
- [x] This PR introduces a new pipeline and model
- [x] All necessary registration points are updated
- [x] The implementation is consistent with existing Diffusers conventions
Who can review?
Tagging pipeline & transformer reviewers:
@asomoza @yiyixuxu @sayakpaul
Can you link the original codebase, paper, and some results of this model?
https://huggingface.co/NewBie-AI/NewBie-image-Exp0.1
https://github.com/[NewBieAI-Lab/NewBie-image-Exp0.1
This model is based on improvements made to research on lumina.
Based on NextDiT
Example:
Thanks for your work!
The PR https://github.com/huggingface/diffusers/pull/12803 is in a better place to be merged. Could you try to collaborate on that PR, instead?