diffusers SD3 and Gradient checkpointing gives error and crashes

Describe the bug

Describe the bug

Activating --gradient_checkpointing in either Lora or DB scripts for SD3 causes: TypeError: layer_norm(): argument 'input' (position 1) must be Tensor, not tuple, which crashes the run, without it, LoRA runs fine at about 20GB vram usage batch size 1 with AdamW8bit

imagen

Reproduction

Add --gradient_checkpointing to training parameters.

Logs

No response

System Info

🤗 Diffusers version: 0.29.0.dev0
Platform: Windows-10-10.0.19045-SP0
Running on a notebook?: No
Running on Google Colab?: No
Python version: 3.10.11
PyTorch version (GPU?): 2.2.1+cu118 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.23.3
Transformers version: 4.41.2
Accelerate version: 0.31.0
PEFT version: 0.11.1
Bitsandbytes version: 0.43.0
Safetensors version: 0.4.2
xFormers version: not installed
Accelerator: NVIDIA GeForce RTX 3090, 24576 MiB NVIDIA GeForce RTX 4090, 24564 MiB VRAM
Using GPU in script?: RTX 4090
Using distributed or parallel set-up in script?: No DDP or similar parallel setups.

Who can help?

No response

Jun 13 '24 00:06 bluvoll

i wish i'd looked sooner, haha. i was hunting this one down.

Jun 13 '24 01:06 bghira

@sayakpaul @DN6 i can confirm this one

Jun 13 '24 01:06 bghira

Can confirm with --gradient_checkpointing this error happens. With the LoRA training.

diffusers 0.29.0

Jun 13 '24 01:06 rockerBOO

I have fixed this here: https://github.com/huggingface/diffusers/pull/8542

$RefractAI avatar$ Jun 13 '24 21:06 RefractAI

Since #8542 was merged, can we close this?

Jul 01 '24 10:07 DN6

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sep 14 '24 15:09 github-actions[bot]

Closing this since #8542 seems like the fix and due to inactivity to @DN6's question. If the issue still persists, please LMK and re-open this so we can work on it asap

Nov 18 '24 20:11 a-r-r-o-w

diffusers diffusers copied to clipboard

SD3 and Gradient checkpointing gives error and crashes

Describe the bug

Reproduction

Logs

System Info

Who can help?

diffusers
diffusers copied to clipboard