pi-tau

Results 2 comments of pi-tau

Hi, this is quite interesting that the extra final activation worsens results. Thanks for sharing.

Hi, thanks for the useful info. I didn't know that. In this case, if training with SGD or SGD+Momentum, would you simply clip the grad norm to 1. ?