ai-toolkit icon indicating copy to clipboard operation
ai-toolkit copied to clipboard

lr: 1.0e-04 loss: 0.000e+00]loss is nan while training z-image

Open mikeyoubeach opened this issue 1 month ago • 3 comments

This is for bugs only

Did you already ask in the discord?

Yes/No

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes/No

Describe the bug

Image { "type": "diffusion_trainer", "training_folder": "D:\\ai-toolkit\\output", "sqlite_db_path": "D:\\ai-toolkit\\aitk_db.db", "device": "cuda", "trigger_word": "wys", "performance_log_every": 10, "network": { "type": "lora", "linear": 32, "linear_alpha": 32, "conv": 16, "conv_alpha": 16, "lokr_full_rank": true, "lokr_factor": -1, "network_kwargs": { "ignore_if_contains": [] } }, "save": { "dtype": "fp16", "save_every": 250, "max_step_saves_to_keep": 4, "save_format": "diffusers", "push_to_hub": false }, "datasets": [ { "folder_path": "D:\\ai-toolkit\\datasets/wys", "mask_path": null, "mask_min_value": 0.1, "default_caption": "", "caption_ext": "txt", "caption_dropout_rate": 0.05, "cache_latents_to_disk": false, "is_reg": false, "network_weight": 1, "resolution": [ 1024 ], "controls": [], "shrink_video_to_frames": true, "num_frames": 1, "do_i2v": true, "flip_x": false, "flip_y": false } ], "train": { "batch_size": 1, "bypass_guidance_embedding": false, "steps": 3000, "gradient_accumulation": 1, "train_unet": true, "train_text_encoder": false, "gradient_checkpointing": true, "noise_scheduler": "flowmatch", "optimizer": "adamw8bit", "timestep_type": "weighted", "content_or_style": "balanced", "optimizer_params": { "weight_decay": 0.0001 }, "unload_text_encoder": false, "cache_text_embeddings": false, "lr": 0.0001, "ema_config": { "use_ema": false, "ema_decay": 0.99 }, "skip_first_sample": false, "force_first_sample": false, "disable_sampling": false, "dtype": "fp16", "diff_output_preservation": false, "diff_output_preservation_multiplier": 1, "diff_output_preservation_class": "person", "switch_boundary_every": 1, "loss_type": "mse" }, "model": { "name_or_path": "Tongyi-MAI/Z-Image-Turbo", "quantize": false, "qtype": "qfloat8", "quantize_te": false, "qtype_te": "qfloat8", "arch": "zimage:turbo", "low_vram": false, "model_kwargs": {}, "layer_offloading": false, "layer_offloading_text_encoder_percent": 1, "layer_offloading_transformer_percent": 1, "assistant_lora_path": "ostris/zimage_turbo_training_adapter/zimage_turbo_training_adapter_v1.safetensors" }, "sample": { "sampler": "flowmatch", "sample_every": 250, "width": 1024, "height": 1024, "samples": [ { "prompt": "wys\u62ff\u7740\u4e66\u5750\u5728\u516c\u56ed\u7684\u957f\u6905\u4e0a\u9605\u8bfb\uff0c\u6709\u6ed1\u6ed1\u68af" } ], "neg": "", "seed": 42, "walk_seed": true, "guidance_scale": 1, "sample_steps": 8, "num_frames": 1, "fps": 1 } } Using SQLite database at D:\ai-toolkit\aitk_db.db Job ID: "46a61b96-a273-4b9b-a0b9-fe87b54a1874" ############################################# # Running job: wys_lora ############################################# Running 1 process Loading ZImage model Loading transformer Fetching 3 files: 100%|##########| 3/3 [00:00, ?it/s] Loading checkpoint shards: 100%|##########| 3/3 [00:20

mikeyoubeach avatar Nov 30 '25 10:11 mikeyoubeach

Are all the sampled images completely black? Image

On my 3080 laptop graphics card also has this issue when training with the official model.

It might be caused by this reason. https://github.com/Tongyi-MAI/Z-Image/issues/15

I can use this model to sample normal images and train LoRA with this model. https://huggingface.co/dimitribarbot/Z-Image-Turbo-BF16

Image Image

666asd avatar Nov 30 '25 12:11 666asd

Are all the sampled images completely black? <img alt="Image" width="760" height="520" src="https://private-user- During the training process, I did not set sampling and found that the model fails to generate the expected images when used in Comfy UI. After investigation, I discovered the issue was due to the model being corrupted. I re-downloaded the model from https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo, and the problem was resolved. Now the model can train normally and display the loss rate.

mikeyoubeach avatar Nov 30 '25 12:11 mikeyoubeach

Are all the sampled images completely black? <img alt="Image" width="760" height="520" src="https://private-user- During the training process, I did not set sampling and found that the model fails to generate the expected images when used in Comfy UI. After investigation, I discovered the issue was due to the model being corrupted. I re-downloaded the model from https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo, and the problem was resolved. Now the model can train normally and display the loss rate.

Could it be that the model I downloaded is also corrupted? I'll try re-downloading it as well.

666asd avatar Nov 30 '25 12:11 666asd