img2img-turbo
img2img-turbo copied to clipboard
add mixed precision training support for cyclegan turbo
Hi Gaurav,
I've added the mixed precision support for training cyclegan turbo, so that the unpaired training could work on a 24G NVIDIA GPU.
Tried to run this, it fails.
Loading model from: /home/ubuntu/miniconda3/envs/img2img-turbo/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Steps: 0%| | 0/25000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/ubuntu/repos/img2img-turbo/src/train_cyclegan_turbo.py", line 410, in <module>
main(args)
File "/home/ubuntu/repos/img2img-turbo/src/train_cyclegan_turbo.py", line 213, in main
accelerator.clip_grad_norm_(params_gen, args.max_grad_norm)
File "/home/ubuntu/miniconda3/envs/img2img-turbo/lib/python3.10/site-packages/accelerate/accelerator.py", line 2157, in clip_grad_norm_
self.unscale_gradients()
File "/home/ubuntu/miniconda3/envs/img2img-turbo/lib/python3.10/site-packages/accelerate/accelerator.py", line 2107, in unscale_gradients
self.scaler.unscale_(opt)
File "/home/ubuntu/miniconda3/envs/img2img-turbo/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 284, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False)
File "/home/ubuntu/miniconda3/envs/img2img-turbo/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 212, in _unscale_grads_
raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
Tried to run this, it fails.
Loading model from: /home/ubuntu/miniconda3/envs/img2img-turbo/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Steps: 0%| | 0/25000 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/ubuntu/repos/img2img-turbo/src/train_cyclegan_turbo.py", line 410, in <module> main(args) File "/home/ubuntu/repos/img2img-turbo/src/train_cyclegan_turbo.py", line 213, in main accelerator.clip_grad_norm_(params_gen, args.max_grad_norm) File "/home/ubuntu/miniconda3/envs/img2img-turbo/lib/python3.10/site-packages/accelerate/accelerator.py", line 2157, in clip_grad_norm_ self.unscale_gradients() File "/home/ubuntu/miniconda3/envs/img2img-turbo/lib/python3.10/site-packages/accelerate/accelerator.py", line 2107, in unscale_gradients self.scaler.unscale_(opt) File "/home/ubuntu/miniconda3/envs/img2img-turbo/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 284, in unscale_ optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False) File "/home/ubuntu/miniconda3/envs/img2img-turbo/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 212, in _unscale_grads_ raise ValueError("Attempting to unscale FP16 gradients.") ValueError: Attempting to unscale FP16 gradients.
Hi, please try to set the mixed precision to bf16
, that should work. My local GPU is NVIDIA GeForce RTX 4090 24GB.
@King-HAW Hi , thanks for sharing I meet a problem as follows when i use the mixed precision
ValueError: Query/Key/Value should either all have the same dtype, or (in the quantized case) Key/Value should have dtype torch.int32
query.dtype: torch.float32 key.dtype : torch.bfloat16 value.dtype: torch.bfloat16
But I solve this problem when i I run accelerate without --enable_xformers_memory_efficient_attention
by following https://github.com/huggingface/accelerate/issues/2182
Do you meet the same problem before , and how do you solve this problem
Hi @King-HAW, I have tried your fork, but the out of memory has still maintained (I use 3090 with 24GB VRAM). Could you please explain to me how can I fix this error?
Thank you so much.