chelydrae
chelydrae
I am facing the same problem. It is not solved yet, however, I can locate the issue: In GaussianPolicy -> sample(): The policy output x_t is transformed by tanh(x_t). If...
> > this is equal to 1 or -1 for most of the arguments besides a small area between -5 and 5 > > Can you explain a little bit...
> So the code is working on your side with the learned temperature and w/o modification ? Yes. It works without modification. If I have time in the evening I'll...