Youtube-Code-Repository
Youtube-Code-Repository copied to clipboard
ActorNetwork - sample_normal method log_probs issue
In the following line, the code can break if the value of 'self.max_action' is high enough that 'action' could have a high value, making the value within the logarithm negative. Negative values of logarithms return NaN.
log_probs -= T.log(1-action.pow(2)+self.reparam_noise)
https://github.com/philtabor/Youtube-Code-Repository/blob/a6006478809f3c00026b6ce921a2d4a23b4b1df9/ReinforcementLearning/PolicyGradient/SAC/networks.py#L130