intuitive_policy_gradient
intuitive_policy_gradient copied to clipboard
Uploaded .gif while not consistent with the discussion in the notebook
In the notebook, it is mentioned that "Here we can see that the third action, in spice of having a lower value than the other two, ends up winning because it starts out initialized much higher." This is not consistent with the uploaded "bad.gif" file (corresponding to that section of the notebook) where the first action still wins!