pi-tau
Results
2
comments of
pi-tau
Hi, this is quite interesting that the extra final activation worsens results. Thanks for sharing.
Hi, thanks for the useful info. I didn't know that. In this case, if training with SGD or SGD+Momentum, would you simply clip the grad norm to 1. ?