Inverse-Reinforcement-Learning
Inverse-Reinforcement-Learning copied to clipboard
maxent seems to be using max instead of softmax for V_soft?
In the backwards pass of MaxEnt (Algo 9.1 Brian's thesis), MaxEnt uses a softmax calculation to update the V function (soft Value function), but maxent.py seems to call value_iteration.optimal_value which calculates the hard Value function that is it uses max instead of softmax. This seems like a bug.
Also the initialization seems kind of weird, atleast for gridworld settings only the final state should be initialized to 0 while all others should be -infinity but value_iteration.optimal_value seems to set everything to 0 initially. Any reason for this discrepancy?
Code for reference: https://github.com/MatthewJA/Inverse-Reinforcement-Learning/blob/master/irl/value_iteration.py#L63
Did you reach any conclusion on that?
I'm pretty sure it is wrong, it should be the softmax value function and not a hard max. Should be pretty easy to fix though.
Hey, thanks for the comments. It could very well be a bug (including the initialisation of gridworld) — it's been a long while since I've looked at inverse reinforcement learning so I'm not sure. I'm happy to take pull requests if anyone wants to look into this.
Hey guys,
thanks for the responses.
In these paper they usually mention that the policy should look something like this:
(Kitani et al.)
(Ziebart et al.)
But I wonder if and how it matters? Because apparently, it seems to work with your code.