InfoPro-Pytorch why zero_grad() before train iter

why zero_grad() before train iter

Open baibaidj opened this issue 3 years ago • 3 comments

Hi. Thanks for the great idea and sharing the code. I have a question. You changed the position of runner.optimizer.zero_grad() from after_train_iter to before_train_iter, refererence to https://github.com/blackfeather-wang/InfoPro-Pytorch/blob/a38362a2b59949a82b7451af7b95dc0b8da31a0b/Semantic%20segmentation/mmsegmentation-master/mmseg/apis/train.py#L26

I was wondering what this change will do and how does it affect the training and model performance. I adopted your idea to UNet in medical segmentation but missed the aforementioned change. Yet the training converges correctly as expected. Now I realized the change and like to know its supposed effect. Thank you again.

Apr 16 '21 03:04 baibaidj

Thank you for your attention! In fact, in the forward() function, we iteratively execute the feed-forward and the back-propogation process of each local module. If runner.optimizer.zero_grad() is placed in after_train_iter, the gradients of all local modules except for the last one will be deleted. In other words, only the last module will be trained.

Apr 23 '21 03:04 blackfeather-wang

The network will of course converge. But the performance may drop:)

Apr 23 '21 03:04 blackfeather-wang

Thank you for your attention! In fact, in the forward() function, we iteratively execute the feed-forward and the back-propogation process of each local module. If runner.optimizer.zero_grad() is placed in after_train_iter, the gradients of all local modules except for the last one will be deleted. In other words, only the last module will be trained.

Got it. Thanks.

Apr 23 '21 10:04 baibaidj

InfoPro-Pytorch InfoPro-Pytorch copied to clipboard

why zero_grad() before train iter

InfoPro-Pytorch
InfoPro-Pytorch copied to clipboard