option-critic-pytorch icon indicating copy to clipboard operation
option-critic-pytorch copied to clipboard

Termination prob calculated over current state instead of the next state

Open backpropper opened this issue 5 years ago • 2 comments

The termination probability is calculated over the next state according to the original paper. So it should be using next_obs instead of obs.

https://github.com/lweitkamp/option-critic-pytorch/blob/0c57da7686f8903ed2d8dded3fae832ee9defd1a/option_critic.py#L238

backpropper avatar Aug 20 '20 12:08 backpropper

I'm not so sure this is a problem; it's similar to line 128 in the original code. We are already calculating the Q value for s, so perhaps the authors see it as too expensive to calculate it for both s and s' (all terms in the beta update concern s').

I'm going to keep this issue open because I might see what happens if the code is changed.

lweitkamp avatar Aug 23 '20 13:08 lweitkamp

The option_term_prob gives the option termination probability for the current option and done indicates a transition from current state to the next state. In that case, we need an advantage over the next state. The other way would be to replace the two with prev_option_term_prob and the previous dones, since they are already available to the agent at a given timestep.

backpropper avatar Aug 23 '20 14:08 backpropper