muzero-general
muzero-general copied to clipboard
If I know the environment, is it better to train alphazero?
If I have access to the environment model, is it faster/better to train alphazero instead?
thanks
Hello, If you want more details about the differences between both algorithms, I suggest you take a look at #143, as it explains the main differences between both algorithms. As I have not conducted specific experiments on comparing the speed of both algorithms I can’t answer properly, but looking at the result of experiments in both original paper, it seems clear that AlphaZero is faster to train (which is quite predictable since Muzero has to learn a model of the environment). However, for inference it could be better to use MuZero, as you wouldn’t have to access the environment directly which sometimes can be a real advantage. Hope this helps