diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Add NewbiePipeline and NextDiT_3B_GQA_patch2_Adaln_Refiner_WHIT_CLIP transformer

Open E-Anlia opened this issue 3 weeks ago • 3 comments

This PR introduces a new text-to-image pipeline named NewbiePipeline, as well as a new NextDiT-based transformer architecture, NextDiT_3B_GQA_patch2_Adaln_Refiner_WHIT_CLIP, fully implemented following Diffusers' pipeline and model design principles.

🚀 Main additions

• New pipeline Adds NewbiePipeline under diffusers.pipelines.newbie/.
The pipeline follows the standard Diffusers structure (DiffusionPipeline subclass) and supports loading via from_pretrained.

• New transformer architecture Adds transformer_newbie.py, implementing:

  • NextDiT backbone with grouped-query attention (GQA)
  • Adaln-Refiner blocks
  • Patch-size 2 vision encoder
  • 36 transformer layers
  • 2304 hidden dims
  • WHIT CLIP–style text conditioning

The transformer inherits from ModelMixin, enabling standard save/load, weight serialization and integration with Diffusers utilities.

• RMSNorm implementation Adds RMSNorm to diffusers.models.components, using a PyTorch fallback and supporting Apex fused RMSNorm if available.

• Scheduler compatibility The pipeline is compatible with FlowMatchEulerDiscreteScheduler without requiring additional custom scheduler code.

🧩 Motivation

This PR provides an implementation of a modern NextDiT-style text-to-image architecture with high-resolution capability and strong conditioning support.
The goal is to enable researchers and users to load, run, and fine-tune this model directly through Diffusers with minimal friction.

📁 Files added

src/diffusers/models/components.py src/diffusers/models/transformers/transformer_newbie.py src/diffusers/pipelines/newbie/pipeline_newbie.py src/diffusers/pipelines/newbie/init.py

shell Copy code

📁 Files modified

src/diffusers/init.py src/diffusers/models/init.py src/diffusers/models/transformers/init.py src/diffusers/pipelines/init.py

yaml Copy code

✔ Notes

  • No external dependencies required
  • Apex is optional; PyTorch RMSNorm is the default path
  • The pipeline has been tested locally with from_pretrained and produces expected outputs
  • Follows the established structure of Diffusers pipelines & transformer modules

Fixes # (no issue linked)


Before submitting

  • [x] I have read the contributor guidelines
  • [x] This PR introduces a new pipeline and model
  • [x] All necessary registration points are updated
  • [x] The implementation is consistent with existing Diffusers conventions

Who can review?

Tagging pipeline & transformer reviewers:
@asomoza @yiyixuxu @sayakpaul

E-Anlia avatar Dec 04 '25 05:12 E-Anlia

Can you link the original codebase, paper, and some results of this model?

sayakpaul avatar Dec 04 '25 11:12 sayakpaul

https://huggingface.co/NewBie-AI/NewBie-image-Exp0.1 https://github.com/[NewBieAI-Lab/NewBie-image-Exp0.1 NewBie_image_Exp0 1_Training This model is based on improvements made to research on lumina. Based on NextDiT Example: newbie_image

E-Anlia avatar Dec 05 '25 03:12 E-Anlia

Thanks for your work!

The PR https://github.com/huggingface/diffusers/pull/12803 is in a better place to be merged. Could you try to collaborate on that PR, instead?

sayakpaul avatar Dec 08 '25 04:12 sayakpaul