regretful-agent
regretful-agent copied to clipboard
Possible typo in code (or paper)
Hi!
I was looking at the progress monitor, which the paper shows it's calculation as:
But this line in the code shows a slightly different equation (one closing parenthesis changed position): https://github.com/chihyaoma/regretful-agent/blob/5caf7b500667981bc7064e4d31b49e83db64c95a/tasks/R2R-pano/models/policy_model.py#L133
This would translate to (in the paper notation):
The difference is that in the first, the tanh
is included within the sigmoid, and, in the second equation, it's outside the sigmoid.
Is there any major difference between using these two equations?