brian6091
brian6091
https://discuss.pytorch.org/t/why-are-my-tensors-gradients-unexpectedly-none-or-not-none/111461 https://discuss.pytorch.org/t/element-0-of-tensors-does-not-require-grad-and-does-not-have-a-grad-fn/32908
Manually setting loss.requires_grad=True (fp16) reveals this: > > Steps: 0% 0/1000 [00:00
> train_unet_module_or_class: [attn2] > train_unet_submodule: [to_k, to_v] > # > train_text_module_or_class: [embeddings] > train_text_submodule: [token_embedding] > # > lora_unet_layer: null > lora_unet_train_off_target: null > lora_unet_rank: 4 > lora_unet_alpha: 4.0 >...
So it took a little time, but I traced this to setting gradient_checkpointing: True, but only for the text_encoder. This has a method name different from the Unet, since it...
@amerkay Glad you can follow given the rather uninformative commit messages! Lots of changes coming up. I should have it ready for testing in a couple of days. Inference part...
Cool, I'll ping you when I get the training part finished. I'm waiting for the safetensors pull request to get merged into LoRA to decide exactly how to go about...
@amerkay > I can help with that. I had to modify the main branch notebook to "monkey patch" LoRA for inference. > Ah yes, I forgot to do that. I've...
Sorry, I just meant for the inference part that you had to add to the notebook on main
https://discuss.pytorch.org/t/how-pytorch-releases-variable-garbage/7277?u=ptrblck https://discuss.pytorch.org/t/how-to-free-gpu-memory-changing-architectures-while-training/67261 https://discuss.pytorch.org/t/time-memory-keeps-increasing-at-every-iteration/111453
A100-SXM4-40GB * GPU=31142/40536MiB, 32814 after first save, 33302 after 2nd save, * 1.03s/it training, 3.30s/it inference * BATCH_SIZE=4 * TRAIN_TEXT_ENCODER * USE_8BIT_ADAM * FP16 * GRADIENT_CHECKPOINTING * GRADIENT_ACCUMULATION_STEPS=1 * USE_EMA=False...