DeepSpeed
DeepSpeed copied to clipboard
Details about backward hooks in stage3, why detach outputs?
Dear authors,
Thank you for the awesome works. I try to learn some implementation details and come across a small question. I doubt the meaning of the two following lines. I believe it is the same if you remove two lines and in this way, you may save some tmp memory. https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage3.py#L503 https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage3.py#L528
+1 @tjruwase I also wondered why we have detach in the backward hook, isn't it breaking the computational graph? but deepspeed zero 3 is still running fine.