RandMist comments

Repositories
Issues
Comments

Results 3 comments of


                                            RandMist

[Pytorch] change fused cross entropy backward grad to fp32 and reduce one read/…

> @RandMist @yaox12 Hello~ Our experiments found that after applying this change, the output can sometimes become **inf**. This subsequently leads to NaN gradients for the corresponding token during the...

[Pytorch] change fused cross entropy backward grad to fp32 and reduce one read/…

> The goal of this kernel was to avoid saving the input for backward. The goal is to write the gradients on the input tensor itself to reduce the peak...