ai-toolkit lr: 1.0e-04 loss: 0.000e+00]loss is nan while training z-image

This is for bugs only

Did you already ask in the discord?

Yes/No

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes/No

Describe the bug

{ "type": "diffusion_trainer", "training_folder": "D:\\ai-toolkit\\output", "sqlite_db_path": "D:\\ai-toolkit\\aitk_db.db", "device": "cuda", "trigger_word": "wys", "performance_log_every": 10, "network": { "type": "lora", "linear": 32, "linear_alpha": 32, "conv": 16, "conv_alpha": 16, "lokr_full_rank": true, "lokr_factor": -1, "network_kwargs": { "ignore_if_contains": [] } }, "save": { "dtype": "fp16", "save_every": 250, "max_step_saves_to_keep": 4, "save_format": "diffusers", "push_to_hub": false }, "datasets": [ { "folder_path": "D:\\ai-toolkit\\datasets/wys", "mask_path": null, "mask_min_value": 0.1, "default_caption": "", "caption_ext": "txt", "caption_dropout_rate": 0.05, "cache_latents_to_disk": false, "is_reg": false, "network_weight": 1, "resolution": [ 1024 ], "controls": [], "shrink_video_to_frames": true, "num_frames": 1, "do_i2v": true, "flip_x": false, "flip_y": false } ], "train": { "batch_size": 1, "bypass_guidance_embedding": false, "steps": 3000, "gradient_accumulation": 1, "train_unet": true, "train_text_encoder": false, "gradient_checkpointing": true, "noise_scheduler": "flowmatch", "optimizer": "adamw8bit", "timestep_type": "weighted", "content_or_style": "balanced", "optimizer_params": { "weight_decay": 0.0001 }, "unload_text_encoder": false, "cache_text_embeddings": false, "lr": 0.0001, "ema_config": { "use_ema": false, "ema_decay": 0.99 }, "skip_first_sample": false, "force_first_sample": false, "disable_sampling": false, "dtype": "fp16", "diff_output_preservation": false, "diff_output_preservation_multiplier": 1, "diff_output_preservation_class": "person", "switch_boundary_every": 1, "loss_type": "mse" }, "model": { "name_or_path": "Tongyi-MAI/Z-Image-Turbo", "quantize": false, "qtype": "qfloat8", "quantize_te": false, "qtype_te": "qfloat8", "arch": "zimage:turbo", "low_vram": false, "model_kwargs": {}, "layer_offloading": false, "layer_offloading_text_encoder_percent": 1, "layer_offloading_transformer_percent": 1, "assistant_lora_path": "ostris/zimage_turbo_training_adapter/zimage_turbo_training_adapter_v1.safetensors" }, "sample": { "sampler": "flowmatch", "sample_every": 250, "width": 1024, "height": 1024, "samples": [ { "prompt": "wys\u62ff\u7740\u4e66\u5750\u5728\u516c\u56ed\u7684\u957f\u6905\u4e0a\u9605\u8bfb\uff0c\u6709\u6ed1\u6ed1\u68af" } ], "neg": "", "seed": 42, "walk_seed": true, "guidance_scale": 1, "sample_steps": 8, "num_frames": 1, "fps": 1 } } Using SQLite database at D:\ai-toolkit\aitk_db.db Job ID: "46a61b96-a273-4b9b-a0b9-fe87b54a1874" ############################################# # Running job: wys_lora ############################################# Running 1 process Loading ZImage model Loading transformer Fetching 3 files: 100%|##########| 3/3 [00:00, ?it/s] Loading checkpoint shards: 100%|##########| 3/3 [00:20

Nov 30 '25 10:11 mikeyoubeach

Are all the sampled images completely black?

On my 3080 laptop graphics card also has this issue when training with the official model.

It might be caused by this reason. https://github.com/Tongyi-MAI/Z-Image/issues/15

I can use this model to sample normal images and train LoRA with this model. https://huggingface.co/dimitribarbot/Z-Image-Turbo-BF16

Nov 30 '25 12:11 666asd

Are all the sampled images completely black? <img alt="Image" width="760" height="520" src="https://private-user- During the training process, I did not set sampling and found that the model fails to generate the expected images when used in Comfy UI. After investigation, I discovered the issue was due to the model being corrupted. I re-downloaded the model from https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo, and the problem was resolved. Now the model can train normally and display the loss rate.

Nov 30 '25 12:11 mikeyoubeach

Are all the sampled images completely black? <img alt="Image" width="760" height="520" src="https://private-user- During the training process, I did not set sampling and found that the model fails to generate the expected images when used in Comfy UI. After investigation, I discovered the issue was due to the model being corrupted. I re-downloaded the model from https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo, and the problem was resolved. Now the model can train normally and display the loss rate.

Could it be that the model I downloaded is also corrupted? I'll try re-downloading it as well.

Nov 30 '25 12:11 666asd