diffusers
diffusers copied to clipboard
ValueError: Query/Key/Value should either all have the same dtype training LORA
Describe the bug
LORA training don't work with mixed precision enabled:
File "/home/imgen/miniconda3/envs/py32/lib/python3.11/site-packages/xformers/ops/fmha/init.py", line 348, in _memory_efficient_at tention_forward_requires_grad inp.validate_inputs() File "/home/imgen/miniconda3/envs/py32/lib/python3.11/site-packages/xformers/ops/fmha/common.py", line 121, in validate_inputs raise ValueError( ValueError: Query/Key/Value should either all have the same dtype, or (in the quantized case) Key/Value should have dtype torch.int32 query.dtype: torch.float32 key.dtype : torch.float16 value.dtype: torch.float16
full stacktrace in the attachment
Reproduction
export MODEL_NAME=/home/imgen/models/SDXL/juggernautXL_v8Rundiffusion/
export OUTPUT_DIR=`pwd`/poke11-lora
export DATASET_NAME="lambdalabs/pokemon-blip-captions"
accelerate launch examples/text_to_image/train_text_to_image_lora_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--dataloader_num_workers=8 \
--resolution=1024 \
--center_crop \
--random_flip \
--enable_xformers_memory_efficient_attention \
--mixed_precision=fp16 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=15000 \
--learning_rate=1e-04 \
--max_grad_norm=1 \
--lr_scheduler="cosine" \
--lr_warmup_steps=0 \
--output_dir=${OUTPUT_DIR} \
--gradient_checkpointing \
--use_8bit_adam \
--checkpointing_steps=500 \
--validation_prompt="A pokemon with red nose." \
--seed=1337
Logs
System Info
pytorch 2.1.2+cu118 diffusers 0.26.0.dev0 accelerate 0.26.1
Who can help?
@sayakpaul @patrickvonplaten
It's a known problem when using xformers. I recommend building xformers from the source to be able to fix it.
@sayakpaul i built xformers from source but the issue is still present
Does it work with PyTorch 2.0.0?
Looks the same with pytorch 2.0.0, xformers installed with pip install xformers==0.0.19
File "/home/imgen/miniconda3/envs/py31/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 317, in _memory_efficient_attention_forward_requires_grad
inp.validate_inputs()
File "/home/imgen/miniconda3/envs/py31/lib/python3.10/site-packages/xformers/ops/fmha/common.py", line 73, in validate_inputs
raise ValueError(
ValueError: Query/Key/Value should all have the same dtype
Steps: 0%| | 0/15000 [00:04<?, ?it/s]
Traceback (most recent call last):
File "/home/imgen/miniconda3/envs/py31/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/imgen/miniconda3/envs/py31/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/home/imgen/miniconda3/envs/py31/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
simple_launcher(args)
File "/home/imgen/miniconda3/envs/py31/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/imgen/miniconda3/envs/py31/bin/python', 'examples/text_to_image/train_text_to_image_lora_sdxl.py', '--pretrained_model_name_or_path=/home/imgen/models/SDXL/juggernautXL_v8Rundiffusion/', '--dataset_name=lambdalabs/pokemon-blip-captions', '--dataloader_num_workers=8', '--resolution=1024', '--center_crop', '--random_flip', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--max_train_steps=15000', '--learning_rate=1e-04', '--max_grad_norm=1', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--output_dir=/home/imgen/projects/diffusers/poke11-lora', '--gradient_checkpointing', '--use_8bit_adam', '--checkpointing_steps=500', '--validation_prompt=A pokemon with red nose.', '--seed=1337']' returned non-zero exit status 1.
(py31) imgen@k6:~/projects/diffusers$ python3
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.0.0+cu118'
>>> import xformers
>>> xformers.__version__
'0.0.19'
>>>
How about PyTorch 1.13?
I have seen people reporting the same error and resolving it with xformers built from source. Example: https://github.com/huggingface/accelerate/issues/2182#issuecomment-1864127640. If that doesn't solve the problem, it could very well be a recent PyTorch / xformers training incompatibility issue. Sadly, we don't have time to look into that right now.
Do you happen to have a time frame for when you can look into that @sayakpaul?
Sadly, no.
Multiple folks have concluded that it is PyTorch/xFormers version/build issue that arises even outside of diffusers.
So, we need to be cognizant of that.
facing the same issue.. using following versions:
>>> torch.__version__
'2.2.0+cu121'
>>> import xformers
>>> xformers.__version__
'0.0.24'
this occurs at 200 steps when validation block runs..
Does it not happen when you run without xformers? Also, you're on the latest diffusers, training with peft, yeah?
doesn't happen without xformers. I also tried using the same pre-compiled binaries corresponding to CUDA 11.8 for pytorch and xformers. However, the issue still persisted. Removing the xformers flag from the launch command did help. Yes, I'm using latest diffusers and training with peft (the LoRA script given in LCM/consistency distillation example)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
not stale
any updates on this?
https://github.com/facebookresearch/xformers/issues/934 this is where we're at. Cannot do much anything sadly.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.