Youtube-Code-Repository ActorNetwork - sample_normal method log

ActorNetwork - sample_normal method log_probs issue

Open zenineasa opened this issue 1 year ago • 0 comments

In the following line, the code can break if the value of 'self.max_action' is high enough that 'action' could have a high value, making the value within the logarithm negative. Negative values of logarithms return NaN.

log_probs -= T.log(1-action.pow(2)+self.reparam_noise)

https://github.com/philtabor/Youtube-Code-Repository/blob/a6006478809f3c00026b6ce921a2d4a23b4b1df9/ReinforcementLearning/PolicyGradient/SAC/networks.py#L130

Feb 24 '23 22:02 zenineasa

Youtube-Code-Repository Youtube-Code-Repository copied to clipboard

ActorNetwork - sample_normal method log_probs issue

Youtube-Code-Repository
Youtube-Code-Repository copied to clipboard