Moore-AnimateAnyone icon indicating copy to clipboard operation
Moore-AnimateAnyone copied to clipboard

RuntimeError: Trying to backward through the graph a second time

Open TZYSJTU opened this issue 1 year ago • 2 comments

I think training all networks together will be better, so I set

    reference_unet.requires_grad_(True)
    denoising_unet.requires_grad_(True)
    pose_guider.requires_grad_(True)

however, when I use gradient_checkpointing: True in the train config yaml, it raised error RuntimeError: Trying to backward through the graph a second time And I found if I set

    reference_unet.requires_grad_(False)
    denoising_unet.requires_grad_(True)
    pose_guider.requires_grad_(True)

it will be OK So what's wrong with the reference_unet? Could you please help?

TZYSJTU avatar May 16 '24 08:05 TZYSJTU

I think training all networks together will be better, so I set

    reference_unet.requires_grad_(True)
    denoising_unet.requires_grad_(True)
    pose_guider.requires_grad_(True)

however, when I use gradient_checkpointing: True in the train config yaml, it raised error RuntimeError: Trying to backward through the graph a second time And I found if I set

    reference_unet.requires_grad_(False)
    denoising_unet.requires_grad_(True)
    pose_guider.requires_grad_(True)

it will be OK So what's wrong with the reference_unet? Could you please help?

I also encountered this problem. When training referencenet, turning on gradient_accumulation_step will report an error RuntimeError: Trying to backward through the graph a second time. If don't train referencenet, it will be normal. Have you found a solution?

zhuochen02 avatar Oct 09 '24 05:10 zhuochen02

I have a similar problem. Have you found a solution?

Removing denoising_unet.enable_gradient_checkpointing() works but the GPUs go OOM.

antoinedelplace avatar Nov 12 '24 14:11 antoinedelplace