option-critic-pytorch
option-critic-pytorch copied to clipboard
Termination prob calculated over current state instead of the next state
The termination probability is calculated over the next state according to the original paper. So it should be using next_obs instead of obs.
https://github.com/lweitkamp/option-critic-pytorch/blob/0c57da7686f8903ed2d8dded3fae832ee9defd1a/option_critic.py#L238
I'm not so sure this is a problem; it's similar to line 128 in the original code. We are already calculating the Q value for s, so perhaps the authors see it as too expensive to calculate it for both s and s' (all terms in the beta update concern s').
I'm going to keep this issue open because I might see what happens if the code is changed.
The option_term_prob gives the option termination probability for the current option and done indicates a transition from current state to the next state. In that case, we need an advantage over the next state. The other way would be to replace the two with prev_option_term_prob and the previous dones, since they are already available to the agent at a given timestep.