ai-toolkit Hello, why does the sample preview in aitoolkit look great, but the same prompt words look much worse in ComfyUI? How can I reproduce this in ComfyUI? Thank you!

I can't achieve the same effect in ComfyUI as in the AItoolkit preview, no matter what I do. I'm using a trained zimage turbo lora.

Dec 13 '25 04:12 icefox21

In the sampling preview of AIToolkit, both prompt adherence and detail adherence are extremely flawless—everything that should be present is there, and everything that shouldn’t be is completely absent. But no matter how I adjust the settings in ComfyUI, the result is far from satisfactory. It’s really perplexing. Could you please help me? Thank you. I’ve also met friends in the community who are facing the same issue, and we’re all confused.

Dec 13 '25 05:12 icefox21

Same problem in Zimage training. The sample in aitoolkit looks better, but when I use the same (prompt, step, guidance_scale and seed) in my code, it gives a different image.

This might be related to the quantization operation I performed. The original Zimage pipe uses no more than 24GB of VRAM, allowing me to load it directly without quantization. However, when loading via aitoolkit, 7-bit quantization becomes mandatory.

The sample code:

pipe = ZImagePipeline.from_pretrained(
    "/my/path/to/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True
).to('cuda')
pipe.load_lora_weights("/my/path/to/Z-Image-Turbo",weight_name=/my/path/to/lora.safetensor)
image = pipe(
    prompt=prompt,
    negative_prompt=neg_prompt,
    height=1024,
    width=1024,
    num_inference_steps=20,  
    guidance_scale=3.0,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]

In the aitoolkit, the default parameters of guidance_scale=3 and step=20 deliver optimal performance, while in my code, following the Zimage author's recommended values (guidance_scale=0, step=24) yields the best results. This discrepancy is perplexing given that all other parameters remain identical between the two implementations.

Can the Zimage training code be optimized to use the pipe provided by the author? Or can a sample output method be provided to generate images identical to those produced during training?

Dec 19 '25 01:12 FinallyKiKi

I've solved the problem. The root cause was that the image generator was resetting its seed with each use—it should instead be explicitly assigned a fixed seed every time it's utilized. For example, the model Zimage trainning: extensions_built_in/diffusion_models/z_image/z_image.py, line 309-319

img = pipeline(
    prompt_embeds=conditional_embeds.text_embeds,
    negative_prompt_embeds=unconditional_embeds.text_embeds,
    height=gen_config.height,
    width=gen_config.width,
    num_inference_steps=gen_config.num_inference_steps,
    guidance_scale=gen_config.guidance_scale,
    latents=gen_config.latents,
    generator=generator.manual_seed(gen_config.seed),    # need reset seed
    **extra,
).images[0]

And if you give specific seed, will get corresponding image.

Dec 19 '25 07:12 FinallyKiKi