rlpyt
rlpyt copied to clipboard
A question about CategoricalPgAgent
Hi! Thanks for your careful description of the library and I meet a question when reading it.
When I think about the relation between the CategoricalPgAgent and the MujocoFfModel, I find that these outputs of MujocoFfModel's function forward are mu, log_std, v but according to the source code the results which CategoricalPgAgent receive are pi, value. I want to know whether pi here has the same meaning with mu? And if so, when I run the code, it encounters an error 'probability tensor contains either inf
, nan
or element < 0', and I find that the vector pi has some of its elements < 0. It is unreasonable cause pi stands for the probabilities of each action, doesn't it ? Hope you could help me better understand it.