ptan
ptan copied to clipboard
pollicy functions should use torch functions instead of numpy
https://pytorch.org/docs/stable/distributions.html score functions and categorical sampling is already implemented in pytorch, using numpy should be discouraged. policy network should output a probability distribution