Inverse-Reinforcement-Learning maxent seems to be using max instead of softmax for V

maxent seems to be using max instead of softmax for V_soft?

Open mohitsharma0690 opened this issue 8 years ago • 4 comments

In the backwards pass of MaxEnt (Algo 9.1 Brian's thesis), MaxEnt uses a softmax calculation to update the V function (soft Value function), but maxent.py seems to call value_iteration.optimal_value which calculates the hard Value function that is it uses max instead of softmax. This seems like a bug.

Also the initialization seems kind of weird, atleast for gridworld settings only the final state should be initialized to 0 while all others should be -infinity but value_iteration.optimal_value seems to set everything to 0 initially. Any reason for this discrepancy?

Code for reference: https://github.com/MatthewJA/Inverse-Reinforcement-Learning/blob/master/irl/value_iteration.py#L63

Jul 31 '17 02:07 mohitsharma0690

Did you reach any conclusion on that?

Nov 27 '17 23:11 magnusja

I'm pretty sure it is wrong, it should be the softmax value function and not a hard max. Should be pretty easy to fix though.

Dec 03 '17 01:12 mohitsharma0690

Hey, thanks for the comments. It could very well be a bug (including the initialisation of gridworld) — it's been a long while since I've looked at inverse reinforcement learning so I'm not sure. I'm happy to take pull requests if anyone wants to look into this.

Dec 07 '17 03:12 MatthewJA

Hey guys,

thanks for the responses. In these paper they usually mention that the policy should look something like this: (Kitani et al.) (Ziebart et al.)

But I wonder if and how it matters? Because apparently, it seems to work with your code.

Dec 13 '17 02:12 magnusja

Inverse-Reinforcement-Learning Inverse-Reinforcement-Learning copied to clipboard

maxent seems to be using max instead of softmax for V_soft?

Inverse-Reinforcement-Learning
Inverse-Reinforcement-Learning copied to clipboard