Dreambooth
Dreambooth copied to clipboard
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Running with LoRA restricted to text_encoder, with no unet training produces title error.
https://discuss.pytorch.org/t/why-are-my-tensors-gradients-unexpectedly-none-or-not-none/111461 https://discuss.pytorch.org/t/element-0-of-tensors-does-not-require-grad-and-does-not-have-a-grad-fn/32908
Manually setting loss.requires_grad=True (fp16) reveals this:
Steps: 0% 0/1000 [00:00<?, ?it/s]Before loss= tensor(0.6998, device='cuda:0') loss.requires_grad= False leaf= True grad_fn= None After loss= tensor(0.6998, device='cuda:0', requires_grad=True) loss.requires_grad= True Traceback (most recent call last): File "/content/Dreambooth/finetune.py", line 697, in
main(args) File "/content/Dreambooth/finetune.py", line 619, in main optimizer.step() File "/usr/local/lib/python3.8/dist-packages/accelerate/optimizer.py", line 134, in step self.scaler.step(self.optimizer, closure) File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 339, in step assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer." AssertionError: No inf checks were recorded for this optimizer.
However, in fp32 (mixed_precision="no"), it runs, but loss.requires_grad=False on each iter:
Steps: 10% 98/1000 [02:23<21:06, 1.40s/it, GPU=9038, Loss/pred=0.1, Loss/prior=0.0045, Loss/total=0.105, lr/text=4.9e-5] Before loss= tensor(0.4947, device='cuda:0') loss.requires_grad= False leaf= True grad_fn= None After loss= tensor(0.4947, device='cuda:0', requires_grad=True) loss.requires_grad= True
train_unet_module_or_class: [attn2] train_unet_submodule: [to_k, to_v]
train_text_module_or_class: [embeddings] train_text_submodule: [token_embedding]
lora_unet_layer: null lora_unet_train_off_target: null lora_unet_rank: 4 lora_unet_alpha: 4.0
lora_text_layer: null lora_text_train_off_target: null lora_text_rank: 4 lora_text_alpha: 4.0
add_instance_token: true
Steps: 0% 48/10000 [01:14<3:21:28, 1.21s/it, GPU=10718, Loss/pred=0.368, Loss/prior=0.0126, Loss/total=0.19, lr/token=2.4e-7, lr/unet=4.8e-7]
Before loss= tensor(0.3415, device='cuda:0', grad_fn=<AddBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f2c7db1fe80>) grad(None) After loss= tensor(0.1707, device='cuda:0', grad_fn=<DivBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f2c7db1fe80>) grad(None)train_unet_module_or_class: [attn2] train_unet_submodule: [to_k, to_v]
train_text_module_or_class: [embeddings] train_text_submodule: [token_embedding]
lora_unet_layer: [Linear] lora_unet_train_off_target: null lora_unet_rank: 4 lora_unet_alpha: 4.0
lora_text_layer: null lora_text_train_off_target: null lora_text_rank: 4 lora_text_alpha: 4.0
add_instance_token: true separate_token_embedding: true
Steps: 0% 6/10000 [00:12<4:09:57, 1.50s/it, GPU=10666, Loss/pred=0.152, Loss/prior=0.212, Loss/total=0.182, lr/token=3e-8, lr/unet=6e-8] Before loss= tensor(0.3792, device='cuda:0', grad_fn=<AddBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7fcb2e93b7f0>) grad(None) After loss= tensor(0.1896, device='cuda:0', grad_fn=<DivBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7fcb2e93b7f0>) grad(None)
train_unet_module_or_class: [attn2] train_unet_submodule: [to_k, to_v]
train_text_module_or_class: [embeddings] train_text_submodule: [token_embedding]
lora_unet_layer: [Linear] lora_unet_train_off_target: null lora_unet_rank: 4 lora_unet_alpha: 4.0
lora_text_layer: null lora_text_train_off_target: null lora_text_rank: 4 lora_text_alpha: 4.0
add_instance_token: true separate_token_embedding: false
Steps: 0% 2/10000 [00:07<9:09:35, 3.30s/it, GPU=10664, Loss/pred=0.252, Loss/prior=0.184, Loss/total=0.218, lr/token=2e-8, lr/unet=1e-7] Before loss= tensor(0.2853, device='cuda:0', grad_fn=<AddBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f3c2179d460>) grad(None) After loss= tensor(0.1426, device='cuda:0', grad_fn=<DivBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f3c2179d460>) grad(None)
train_unet_module_or_class: [attn2] train_unet_submodule: [to_k, to_v]
train_text_module_or_class: null train_text_submodule: null
lora_unet_layer: [Linear] lora_unet_train_off_target: null lora_unet_rank: 4 lora_unet_alpha: 4.0
lora_text_layer: null lora_text_train_off_target: null lora_text_rank: 4 lora_text_alpha: 4.0
add_instance_token: true separate_token_embedding: false
Steps: 0% 4/10000 [00:09<4:38:40, 1.67s/it, GPU=10284, Loss/pred=0.335, Loss/prior=0.0151, Loss/total=0.175, lr/unet=2e-7] Before loss= tensor(0.0186, device='cuda:0', grad_fn=<AddBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f4021a96c70>) grad(None) After loss= tensor(0.0093, device='cuda:0', grad_fn=<DivBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f4021a96c70>) grad(None)
train_unet_module_or_class: [attn2] train_unet_submodule: [to_k, to_v]
train_text_module_or_class: null train_text_submodule: null
lora_unet_layer: [Linear] lora_unet_train_off_target: null lora_unet_rank: 4 lora_unet_alpha: 4.0
lora_text_layer: null lora_text_train_off_target: null lora_text_rank: 4 lora_text_alpha: 4.0
add_instance_token: false separate_token_embedding: false
Steps: 0% 7/10000 [00:13<3:37:29, 1.31s/it, GPU=10210, Loss/pred=0.406, Loss/prior=0.01, Loss/total=0.208, lr/unet=4e-7] Before loss= tensor(0.1137, device='cuda:0', grad_fn=<AddBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f0c698ca3a0>) grad(None) After loss= tensor(0.0569, device='cuda:0', grad_fn=<DivBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f0c698ca3a0>) grad(None)
train_unet_module_or_class: [attn2] train_unet_submodule: [to_k, to_v]
train_text_module_or_class: null train_text_submodule: null
lora_unet_layer: [Linear] lora_unet_train_off_target: null lora_unet_rank: 4 lora_unet_alpha: 4.0
lora_text_layer: null lora_text_train_off_target: null lora_text_rank: 4 lora_text_alpha: 4.0
add_instance_token: false separate_token_embedding: false
Steps: 0% 2/10000 [00:07<8:40:02, 3.12s/it, GPU=10244, Loss/pred=0.214, Loss/prior=0.219, Loss/total=0.217, lr/unet=1e-7] Before loss= tensor(0.3070, device='cuda:0', grad_fn=<AddBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7f61ed950250>) grad(None) After loss= tensor(0.1535, device='cuda:0', grad_fn=<DivBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<DivBackward0 object at 0x7f61ed950250>) grad(None)
train_unet_module_or_class: null train_unet_submodule: null
train_text_module_or_class: [CLIPAttention] train_text_submodule: [k_proj, q_proj, v_proj, out_proj]
lora_unet_layer: null lora_unet_train_off_target: null lora_unet_rank: 4 lora_unet_alpha: 4.0
lora_text_layer: null lora_text_train_off_target: null lora_text_rank: 4 lora_text_alpha: 4.0
add_instance_token: false separate_token_embedding: false
Steps: 0% 0/10000 [00:00<?, ?it/s]Before loss= tensor(0.7092, device='cuda:0') is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None) After loss= tensor(0.3546, device='cuda:0') is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None) Traceback (most recent call last): File "/content/Dreambooth/finetune.py", line 701, in
main(args) File "/content/Dreambooth/finetune.py", line 618, in main accelerator.backward(loss) File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1314, in backward self.scaler.scale(loss).backward(**kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/usr/local/lib/python3.8/dist-packages/torch/autograd/init.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn train_unet_module_or_class: null train_unet_submodule: null
train_text_module_or_class: [CLIPAttention] train_text_submodule: [k_proj, q_proj, v_proj, out_proj]
lora_unet_layer: null lora_unet_train_off_target: null lora_unet_rank: 4 lora_unet_alpha: 4.0
lora_text_layer: [Linear] lora_text_train_off_target: null lora_text_rank: 4 lora_text_alpha: 4.0
add_instance_token: false separate_token_embedding: false
Steps: 0% 0/10000 [00:00<?, ?it/s]Before loss= tensor(0.6998, device='cuda:0') is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None) After loss= tensor(0.3499, device='cuda:0') is_leaf(True) requires_grad(False) retains_grad(False) grad_fn(None) grad(None) Traceback (most recent call last): File "/content/Dreambooth/finetune.py", line 701, in
main(args) File "/content/Dreambooth/finetune.py", line 618, in main accelerator.backward(loss) File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1314, in backward self.scaler.scale(loss).backward(**kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/usr/local/lib/python3.8/dist-packages/torch/autograd/init.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn train_unet_module_or_class: null train_unet_submodule: null
train_text_module_or_class: [CLIPAttention] train_text_submodule: [k_proj, q_proj, v_proj, out_proj]
lora_unet_layer: null lora_unet_train_off_target: null lora_unet_rank: 4 lora_unet_alpha: 4.0
lora_text_layer: [Linear] lora_text_train_off_target: null lora_text_rank: 4 lora_text_alpha: 4.0
add_instance_token: false separate_token_embedding: false gradient_checkpointing: false
Steps: 0% 10/10000 [00:13<2:25:40, 1.14it/s, GPU=9436, Loss/pred=0.0642, Loss/prior=0.0144, Loss/total=0.0393, lr/text=1e-7]
Before loss= tensor(0.0557, device='cuda:0', grad_fn=<AddBackward0>) is_leaf(False) requires_grad(True) retains_grad(False) grad_fn(<AddBackward0 object at 0x7fa1936883d0>) grad(None) After loss= tensor(0.0279, device='cuda:0', grad_fn=<DivBackward0>)
So it took a little time, but I traced this to setting gradient_checkpointing: True, but only for the text_encoder. This has a method name different from the Unet, since it appears to come from the Transformers library. Either way, it does something that changes the loss.is_leaf to True?
For now, I disable gradient_checkpointing for the text_encoder