RuntimeError: Trying to backward through the graph a second time
I think training all networks together will be better, so I set
reference_unet.requires_grad_(True)
denoising_unet.requires_grad_(True)
pose_guider.requires_grad_(True)
however, when I use gradient_checkpointing: True in the train config yaml, it raised error RuntimeError: Trying to backward through the graph a second time
And I found if I set
reference_unet.requires_grad_(False)
denoising_unet.requires_grad_(True)
pose_guider.requires_grad_(True)
it will be OK So what's wrong with the reference_unet? Could you please help?
I think training all networks together will be better, so I set
reference_unet.requires_grad_(True) denoising_unet.requires_grad_(True) pose_guider.requires_grad_(True)however, when I use
gradient_checkpointing: Truein the train config yaml, it raised errorRuntimeError: Trying to backward through the graph a second timeAnd I found if I setreference_unet.requires_grad_(False) denoising_unet.requires_grad_(True) pose_guider.requires_grad_(True)it will be OK So what's wrong with the reference_unet? Could you please help?
I also encountered this problem. When training referencenet, turning on gradient_accumulation_step will report an error RuntimeError: Trying to backward through the graph a second time. If don't train referencenet, it will be normal. Have you found a solution?
I have a similar problem. Have you found a solution?
Removing denoising_unet.enable_gradient_checkpointing() works but the GPUs go OOM.