MaxEnt IRL Run-time optimization
It's pretty slow right now, I suspect the value iteration is slow. Good to do some profiling to pin it down. Possible that moving things to GPU would speed things up (although I suspect environments we're using so far are small enough it's not worth the overhead).
I have an implementation of value iteration in Numpy that's about 50-100x faster than my Python implementation in FastOptimalAgent here. (But my Python implementation is likely a lot slower than yours, because it is very generic, so you'll probably see something like 20x speedup.)
Thanks for the link! I'm confused how that implementation works: it doesn't seem to look at transition probabilities. What assumptions is it making? (A deterministic gridworld, perhaps?)
Oh, yes, it's assuming a deterministic MDP. I forgot that your gridworlds are slippery.
That said, you should only need to change the lines that add discounted_values to the qvalues to include all possible transitions -- it should be a simple change, if you're willing to hardcore the transition function (and the actions are already hardcoded so why not the transitions too).
On Tue, Mar 27, 2018, 11:28 AM Adam Gleave [email protected] wrote:
Thanks for the link! I'm confused how that implementation works: it doesn't seem to look at transition probabilities. What assumptions is it making? (A deterministic gridworld, perhaps?)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/HumanCompatibleAI/population-irl/issues/10#issuecomment-376627772, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOMU8sUW-PVfQTCc4xu0PW2fa59d2Ayks5tioTKgaJpZM4S8NPJ .