Rohin Shah

Results 2 comments of Rohin Shah

I have an implementation of value iteration in Numpy that's about 50-100x faster than my Python implementation in `FastOptimalAgent` [here](https://github.com/HumanCompatibleAI/planner-inference/blob/master/fast_agents.py). (But my Python implementation is likely a lot slower than...

Oh, yes, it's assuming a deterministic MDP. I forgot that your gridworlds are slippery. That said, you should only need to change the lines that add discounted_values to the qvalues...