Inverse-Reinforcement-Learning icon indicating copy to clipboard operation
Inverse-Reinforcement-Learning copied to clipboard

maxent seems to be using max instead of softmax for V_soft?

Open mohitsharma0690 opened this issue 8 years ago • 4 comments

In the backwards pass of MaxEnt (Algo 9.1 Brian's thesis), MaxEnt uses a softmax calculation to update the V function (soft Value function), but maxent.py seems to call value_iteration.optimal_value which calculates the hard Value function that is it uses max instead of softmax. This seems like a bug.

Also the initialization seems kind of weird, atleast for gridworld settings only the final state should be initialized to 0 while all others should be -infinity but value_iteration.optimal_value seems to set everything to 0 initially. Any reason for this discrepancy?

Code for reference: https://github.com/MatthewJA/Inverse-Reinforcement-Learning/blob/master/irl/value_iteration.py#L63

mohitsharma0690 avatar Jul 31 '17 02:07 mohitsharma0690

Did you reach any conclusion on that?

magnusja avatar Nov 27 '17 23:11 magnusja

I'm pretty sure it is wrong, it should be the softmax value function and not a hard max. Should be pretty easy to fix though.

mohitsharma0690 avatar Dec 03 '17 01:12 mohitsharma0690

Hey, thanks for the comments. It could very well be a bug (including the initialisation of gridworld) — it's been a long while since I've looked at inverse reinforcement learning so I'm not sure. I'm happy to take pull requests if anyone wants to look into this.

MatthewJA avatar Dec 07 '17 03:12 MatthewJA

Hey guys,

thanks for the responses. In these paper they usually mention that the policy should look something like this: image (Kitani et al.) image (Ziebart et al.)

But I wonder if and how it matters? Because apparently, it seems to work with your code.

magnusja avatar Dec 13 '17 02:12 magnusja