ai-toolkit icon indicating copy to clipboard operation
ai-toolkit copied to clipboard

[Sharing Experience] Training Z-Image LoRA using 12G VRAM ~ 😁

Open juntaosun opened this issue 4 weeks ago • 14 comments

(1) Datasets Training Preparation: Image material with a maximum side size of 768.

🎉 To minimize VRAM usage, after extensive testing, training was successful with 6~10 images, yielding good LoRa results.

A maximum image side resolution of 1024 might cause insufficient 12G VRAM. You can try it!

(2) New Job Creation: Select Z-Image Turbo, and then set your model path. Follow the settings in the screenshot below.

Image

You need to enter your own trigger words! This is just an example !
⚠️ Remember to set the Transformer Offload to 0%; we won't use it because it will throw an error, and we're unsure if this is a bug.

Image

Correction: Learning Rate 0.0001 ~ 0.0002 !
👉 After extensive testing, 2000 steps is a suitable value.

Image

Datasets: Cache Latents and Resolution 512, 768 ~

A maximum image side resolution of 1024 might cause insufficient 12G VRAM. You can try it!

Image Image

👆 As you can see, training started successfully on 12G VRAM, and the speed is quite good !👇

first_lora_v1:  26%|##5       | 519/2000 [24:43<1:10:54,  2.87s/it, lr: 2.0e-04 loss: 3.811e-01]

Training speed is approximately 2~3 seconds/it, 2000 steps take about 1 - 2 hour to complete.

Finally: Wishing users with low VRAM success in training their own z-image LoRA!
Thanks to ai-toolkit and z-image, have fun! If you have better training settings, please share! 🤗

Image

This is my second z-image LoRA. ai-toolkit\output

Now, you can use it in ComfyUI via the Lora loader ! Z-Image-Turbo ~

juntaosun avatar Dec 01 '25 02:12 juntaosun

thank you, very useful.

leetraman822 avatar Dec 01 '25 05:12 leetraman822

How to train the checkpoints model?

bank010 avatar Dec 01 '25 08:12 bank010

@bank010 I trained Lora using Z-Image-Turbo.

juntaosun avatar Dec 01 '25 08:12 juntaosun

@bank010 I trained Lora using Z-Image-Turbo.

您知道怎么微调Z-image-Turbo的大模型吗?不是lora微调,全量微调

bank010 avatar Dec 01 '25 09:12 bank010

5070TI 16G first try

Image

yamasoo avatar Dec 01 '25 11:12 yamasoo

@yamasoo The 5070TI 16G can train at 1024 resolution.

juntaosun avatar Dec 01 '25 11:12 juntaosun

@juntaosun Thank you for sharing your information. This was very usefull. After some testing I got 1024 resolution running on my RTX 3060 with 12 GB VRAM. I used the standard settings for Z-Image Turbo (like Transformer = float8 and Resolutions = 512, 768, 1024), exept the following changes:

  1. Optimizer: Adafactor
  2. Learning Rate = 0,0003
  3. Steps = 1200
  4. Cache Text Embeddings
  5. Cache Latents
  6. Sample Resolution: Width = 768, Hight = 1024 (with 1024 x 1024 it hangs just at the first sample)

It runs hard at the limit at about 11,5 - 11,7 GB VRAM usage. Speed was aprox. between 6 - 10 s/itteration. With Adafactor and a Learning Rate of 0.0003 approx. 1100 steps seems to be optimal. I`m not really sure, but Cache Text Embeddings and Cache Latents seems to reduce the VRAM usage crucially. Hope this helps some others too.

cmyknao avatar Dec 01 '25 17:12 cmyknao

@cmyknao Thank you for sharing your information.

juntaosun avatar Dec 02 '25 03:12 juntaosun

Thank you very much for these settings. I was trying to use another set that claimed to work on 12GB VRAM but kept getting an OOM near the start. It hasn't finished yet but the training is progressing, which is further than before. I suspect it may have been the cache latents or layer offloading that was the issue, as they were both off in my earlier attempts. Fingers crossed for this attempt.

wideload1971 avatar Dec 02 '25 11:12 wideload1971

@wideload1971 After starting, it uses more than 12 GB of VRAM for a short time. (Unfortunately, I can't remember at which step). But after that, everything runs smoothly.

cmyknao avatar Dec 02 '25 12:12 cmyknao

Thank you, works very well on my rig,

Ryzen 9 3900 + RTX 4070 Super

ladydarkness avatar Dec 02 '25 21:12 ladydarkness

ty this is awesome :333

thinhlpg avatar Dec 03 '25 15:12 thinhlpg

There is no z-image in the model architecture.

7ywx avatar Dec 04 '25 06:12 7ywx

@juntaosun Thank you for sharing your information. This was very usefull. After some testing I got 1024 resolution running on my RTX 3060 with 12 GB VRAM. I used the standard settings for Z-Image Turbo (like Transformer = float8 and Resolutions = 512, 768, 1024), exept the following changes:

1. Optimizer: Adafactor

2. Learning Rate = 0,0003

3. Steps = 1200

4. Cache Text Embeddings

5. Cache Latents

6. Sample Resolution: Width = 768, Hight = 1024 (with 1024 x 1024 it hangs just at the first sample)

It runs hard at the limit at about 11,5 - 11,7 GB VRAM usage. Speed was aprox. between 6 - 10 s/itteration. With Adafactor and a Learning Rate of 0.0003 approx. 1100 steps seems to be optimal. I`m not really sure, but Cache Text Embeddings and Cache Latents seems to reduce the VRAM usage crucially. Hope this helps some others too.

sorry in advance for the "tangential" question. Do you have to add the relative_step parameter to use Adafactor?

Signorlimone avatar Dec 05 '25 00:12 Signorlimone