Results 2 comments of

> Thanks for reaching out [@yang123-xd](https://github.com/yang123-xd) ! > > Have you considered using a smaller learning rate ? the random fluctuations in this case could indeed indicate that training dynamics...

> Thanks [@yang123-xd](https://github.com/yang123-xd) for raising the issue. While it is hard for me to exactly pin-point the root cause, here's a couple of things that comes to my mind: >...