diffae icon indicating copy to clipboard operation
diffae copied to clipboard

Why use zero_module?

Open ZhangMingKun1 opened this issue 1 year ago • 2 comments

Thanks for your code for the project! It is a really nice work!

I am confused about why using zero_module, may lead to the zero_grad between the input and the output. It is possible to correctly train the model parameter with the expected grad?

ZhangMingKun1 avatar Sep 17 '23 15:09 ZhangMingKun1

Is it true that zero module is the cause of zero grad? I'm not sure about this.

By the way, we used zero grad module based on a previous work, but by itself, it also has a positive impact faster learning as well (as shown in the previous works).

phizaz avatar Sep 28 '23 18:09 phizaz

Thank you for your feedback! Could you please provide the paper title or GitHub link of the "previous works"?

ZhangMingKun1 avatar Sep 29 '23 11:09 ZhangMingKun1