xformers [diffusers] training is broken with xformers and PyTorch 2.1

[diffusers] training is broken with xformers and PyTorch 2.1

Open sayakpaul opened this issue 8 months ago • 4 comments

Related issue: https://github.com/huggingface/diffusers/issues/5368

When using PyTorch 2.1 and the latest stable build of xformers, our DreamBooth LoRA script for SDXL doesn't work. https://github.com/huggingface/diffusers/issues/5368 provides more details.

But when using SDPA in the same environment (i.e., no xformers), the issue seems to go away.

Dev environment for this can be found here: https://github.com/huggingface/diffusers/blob/main/docker/diffusers-pytorch-compile-cuda/Dockerfile

When using PyTorch 2.0.1 with xformers==0.0.21, there seem to be no issues with the exact same script. PyTorch was installed with pip install torch==2.0.1+cu117 --index-url https://download.pytorch.org/whl/cu117 inside a Docker image mounted from nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04.

Cc: @patrickvonplaten @williamberman