Hongbin Liu
Hongbin Liu
+1 非常好的 feature
Try disabling the recomputation.
I am not sure if disabling recomputation does work in your case, maybe you can double-check it by enabling recomputation and using 16 layers. I think the loss diff is...
> @lhb8125 could you please sign off your commits ([guide](https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst#sign-your-work))? @ksivaman Changing the history is too tricky. Could you turn to [this MR](https://github.com/NVIDIA/TransformerEngine/pull/1653), where all commits are signed off?
Any comment?
@jxxghp
> Did you try out storing the logits in bf16? It could save lot of memory. Not sure if we need this fusion. @kvareddy Jack has given a good case...
/ok to test 7073437
/ok to test 619bb51