About the traning loss

Open nini0919 opened this issue 11 months ago • 2 comments

Thank you very much for your great work.! I would like to ask for your advice: when I was reproducing your code for training, the mean rewards showed an upward trend, but the loss calculated through the probability distribution hardly decreased and remained constant at around 0.693. Is this normal? Looking forward to your response.

Jan 10 '25 12:01 nini0919

This is my question as well!

Jan 30 '25 07:01 AHHHZ975

I was wondering if you’ve come to a conclusion/answer on this question. @nini0919

Jan 31 '25 04:01 AHHHZ975