train_text_to_image_lora.py faces a problem when enable_xformers_memory_efficient_attention is True
Describe the bug
Hello, When I fine tune the model using this code and I enable enable_xformers_memory_efficient_attention, I face the following bug:
accelerator.backward(loss)
File "C:\Users\Environments\StableDiffusion\lib\site-packages\accelerate\accelerator.py", line 1316, in backward
loss.backward(**kwargs)
File "C:\Users\Environments\StableDiffusion\lib\site-packages\torch\_tensor.py", line 488, in backward
torch.autograd.backward(
File "C:\Users\Environments\StableDiffusion\lib\site-packages\torch\autograd\__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Has anyone fine-tuned the stable diffusion with both LoRa and Xformers?
Reproduction
I only ran this code while enable_xformers_memory_efficient_attention was True.
Logs
No response
System Info
diffusers version: 0.12.1
- Platform: Windows-10-10.0.19045-SP0
- Python version: 3.8.10
- PyTorch version (GPU?): 1.13.1+cu116 (True)
- Huggingface_hub version: 0.12.0
- Transformers version: 0.15.0
- Accelerate version: not installed
- xFormers version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
xFormers 0.0.17.dev449 memory_efficient_attention.cutlassF: available memory_efficient_attention.cutlassB: available memory_efficient_attention.flshattF: available memory_efficient_attention.flshattB: available memory_efficient_attention.smallkF: available memory_efficient_attention.smallkB: available memory_efficient_attention.tritonflashattF: unavailable memory_efficient_attention.tritonflashattB: unavailable swiglu.fused.p.cpp: available is_triton_available: False is_functorch_available: False pytorch.version: 1.13.1+cu116 pytorch.cuda: available gpu.compute_capability: 8.6 gpu.name: NVIDIA GeForce RTX 3090 build.info: available build.cuda_version: 1107 build.python_version: 3.8.10 build.torch_version: 1.13.1+cu117 build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0 8.6 build.env.XFORMERS_BUILD_TYPE: Release build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None build.env.NVCC_FLAGS: None build.env.XFORMERS_PACKAGE_FROM: wheel-main source.privacy: open source
Could you install xformers and accelerateand see if that resolves the issue? Running the efficient attention requires you to havexformers` installed.
Could you install
xformersand accelerateand see if that resolves the issue? Running the efficient attention requires you to havexformers` installed.
Yes, during the inference, xformers accelerate the process significantly. The mentioned problem emerges in the fine-tuning process.
No, I meant do you have xformers and accelerate installed? Your system info shows:
Cc: @patrickvonplaten
It is similar to https://github.com/huggingface/diffusers/issues/2459, it should be fine with the new PR https://github.com/huggingface/diffusers/pull/2464
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.