d3po
d3po copied to clipboard
About the traning loss
Thank you very much for your great work.! I would like to ask for your advice: when I was reproducing your code for training, the mean rewards showed an upward trend, but the loss calculated through the probability distribution hardly decreased and remained constant at around 0.693. Is this normal? Looking forward to your response.
This is my question as well!
I was wondering if you’ve come to a conclusion/answer on this question. @nini0919