Add Newbie Image support

Open Disty0 opened this issue 1 month ago • 0 comments

What does this PR do?

Adds NewbieAI support to Diffusers.
Adds pooled_projection_dim config to Lumina2Transformer2DModel and uses pooled projections from Newbie codebase if it is set to something other than None.

Original NewbieAI model: https://huggingface.co/NewBie-AI/NewBie-image-Exp0.1 NewbieAI in Diffusers format: https://huggingface.co/Disty0/NewBie-image-Exp0.1-Diffusers

Known Issues:

JinaClip requires trust_remote_code=True and has to be loaded separately.
JinaClip doesn't work with CPU Offload.

Example code:

import torch
from diffusers import NewbiePipeline
from transformers import AutoModel

device = "cuda"
model_path = "Disty0/NewBie-image-Exp0.1-Diffusers"
text_encoder_2 = AutoModel.from_pretrained(model_path, subfolder="text_encoder_2", trust_remote_code=True, torch_dtype=torch.bfloat16)
pipe = NewbiePipeline.from_pretrained(model_path, text_encoder_2=text_encoder_2, torch_dtype=torch.bfloat16)
del text_encoder_2

# Enable memory optimizations.
pipe.enable_model_cpu_offload(device=device)

prompt = """
  <character_1>
  <n>$character_1$</n>
  <gender>1girl</gender>
  <appearance>chibi, red_eyes, blue_hair, long_hair, hair_between_eyes, head_tilt, tareme, closed_mouth</appearance>
  <clothing>school_uniform, serafuku, white_sailor_collar, white_shirt, short_sleeves, red_neckerchief, bow, blue_skirt, miniskirt, pleated_skirt, blue_hat, mini_hat, thighhighs, grey_thighhighs, black_shoes, mary_janes</clothing>
  <expression>happy, smile</expression>
  <action>standing, holding, holding_briefcase</action>
  <position>center_left</position>
  </character_1>

  <character_2>
  <n>$character_2$</n>
  <gender>1girl</gender>
  <appearance>chibi, red_eyes, pink_hair, long_hair, very_long_hair, multi-tied_hair, open_mouth</appearance>
  <clothing>school_uniform, serafuku, white_sailor_collar, white_shirt, short_sleeves, red_neckerchief, bow, red_skirt, miniskirt, pleated_skirt, hair_bow, multiple_hair_bows, white_bow, ribbon_trim, ribbon-trimmed_bow, white_thighhighs, black_shoes, mary_janes, bow_legwear, bare_arms</clothing>
  <expression>happy, smile</expression>
  <action>standing, holding, holding_briefcase, waving</action>
  <position>center_right</position>
  </character_2>

  <general_tags>
  <count>2girls, multiple_girls</count>
  <style>anime_style, digital_art</style>
  <background>white_background, simple_background</background>
  <atmosphere>cheerful</atmosphere>
  <quality>high_resolution, detailed</quality>
  <objects>briefcase</objects>
  <other>alternate_costume</other>
  </general_tags>
"""

negative_prompt = "blurry, worst quality, low quality, deformed hands, bad anatomy, extra limbs, poorly drawn face, mutated, extra eyes, bad proportions"

pipe.text_encoder_2 = pipe.text_encoder_2.to(device)
image = pipe(
    prompt,
    negative_prompt=negative_prompt,
    height=1024,
    width=1024,
    guidance_scale=2.5,
    num_inference_steps=30,
    generator=torch.manual_seed(42),
).images[0]
display(image)

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[*] Did you read the contributor guideline?
[*] Did you read our philosophy doc (important for complex PRs)?
[ ] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
[*] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Core library:

Pipelines and pipeline callbacks: @yiyixuxu and @asomoza

Dec 07 '25 15:12 Disty0