muzero-general
muzero-general copied to clipboard
MuZero Unplugged
Hey,
I'm wondering if there is any intention to expand the code basis for MuZero unplugged to make it work in an offline RL setting?
Maybe simply enable reanalyze
?
If it's that simple that would be awesome. The reason why I asked is that the pseudocode from the paper "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model" (arXiv:1911.08265) differs from the paper where DeepMind introduced the "MuZero Unplugged" in "Online and Offline Reinforcement Learning by Planning with a Learned Model" (arXiv:2104.06294v1). For example I couldn't find the self.reanalyse_fraction
.
kindly try EfficientZero, which also controls the reanalyze part with a fraction argument.
Thanks for the suggestion! In the paper's abstract the authors state that it's a visual based algorithm. RL and in specific offline RL is kind of new for me. That's why I'm wondering whether I could still use the algorithm for my problem, where I don't have a simulator but only a non-image based MDP dataset of (s, a, r, s') rows that I want to use for offline RL.
Purely state-based RL should have been much easier than pixel-based RL. You can still use it except with different inputs.
that makes sense – thanks for you help!
Thanks for the suggestion! In the paper's abstract the authors state that it's a visual based algorithm. RL and in specific offline RL is kind of new for me. That's why I'm wondering whether I could still use the algorithm for my problem, where I don't have a simulator but only a non-image based MDP dataset of (s, a, r, s') rows that I want to use for offline RL.
Actually, I am wondering if someone tried to combine Muzero with COMBO---I think it is the right direction, to overcome offline/policy issue.