policy-gradient
policy-gradient copied to clipboard
Why normalize predicted probabilities?
prob = aprob / np.sum(aprob)
https://github.com/keon/policy-gradient/blob/master/pg.py#L46
I am not sure if this line is really required, as they would be already normalized due to softmax. Please let me know in case I am missing something.
I think this line code is no use