pymdptoolbox icon indicating copy to clipboard operation
pymdptoolbox copied to clipboard

Model-free algorithms depend on model

Open sovelten opened this issue 9 years ago • 3 comments

It seems that all the algorithms require that you pass a transition probability table and reward vector, however most of the usefullness of algorithms such as QLearning relies on the fact that it doesn't need these values to estimate policies.

Is this by design? A good update to the library would be to enable model-free learning, because most of the time you don't know the model, you have to simulate it. This would make it much more useful to more people.

sovelten avatar Aug 23 '16 13:08 sovelten

Good point. As nobody has responded, what are you using as an alternative, for such model-free learning?

mrebhan avatar Nov 10 '17 19:11 mrebhan

I have same question, what is everybody using as an alternative for model-free learning?

ajaymaity avatar Mar 25 '21 11:03 ajaymaity

Well, you can either build the transition probabilities into the MDP directly, and then use methods such as value iteration to find policy, or you can build the transition probabilities into a simulator and then have some reinforcement learning agent learn these probabilities from interactions with the simulator. You can find many RL packages on github, but I don't have direct experience with any.

BoZenKhaa avatar Mar 25 '21 11:03 BoZenKhaa