InfoPro-Pytorch
InfoPro-Pytorch copied to clipboard
why zero_grad() before train iter
Hi. Thanks for the great idea and sharing the code. I have a question. You changed the position of runner.optimizer.zero_grad() from after_train_iter to before_train_iter, refererence to https://github.com/blackfeather-wang/InfoPro-Pytorch/blob/a38362a2b59949a82b7451af7b95dc0b8da31a0b/Semantic%20segmentation/mmsegmentation-master/mmseg/apis/train.py#L26
I was wondering what this change will do and how does it affect the training and model performance. I adopted your idea to UNet in medical segmentation but missed the aforementioned change. Yet the training converges correctly as expected. Now I realized the change and like to know its supposed effect. Thank you again.
Thank you for your attention! In fact, in the forward() function, we iteratively execute the feed-forward and the back-propogation process of each local module. If runner.optimizer.zero_grad() is placed in after_train_iter, the gradients of all local modules except for the last one will be deleted. In other words, only the last module will be trained.
The network will of course converge. But the performance may drop:)
Thank you for your attention! In fact, in the forward() function, we iteratively execute the feed-forward and the back-propogation process of each local module. If runner.optimizer.zero_grad() is placed in after_train_iter, the gradients of all local modules except for the last one will be deleted. In other words, only the last module will be trained.
Got it. Thanks.