rlpyt icon indicating copy to clipboard operation
rlpyt copied to clipboard

A question about CategoricalPgAgent

Open LTEnjoy opened this issue 4 years ago • 0 comments

Hi! Thanks for your careful description of the library and I meet a question when reading it. When I think about the relation between the CategoricalPgAgent and the MujocoFfModel, I find that these outputs of MujocoFfModel's function forward are mu, log_std, v but according to the source code the results which CategoricalPgAgent receive are pi, value. I want to know whether pi here has the same meaning with mu? And if so, when I run the code, it encounters an error 'probability tensor contains either inf, nan or element < 0', and I find that the vector pi has some of its elements < 0. It is unreasonable cause pi stands for the probabilities of each action, doesn't it ? Hope you could help me better understand it.

LTEnjoy avatar Feb 02 '21 05:02 LTEnjoy