muzero-general icon indicating copy to clipboard operation
muzero-general copied to clipboard

MuZero Unplugged

Open tbskrpmnns opened this issue 2 years ago • 7 comments

Hey,

I'm wondering if there is any intention to expand the code basis for MuZero unplugged to make it work in an offline RL setting?

tbskrpmnns avatar Feb 28 '22 20:02 tbskrpmnns

Maybe simply enable reanalyze?

0xJchen avatar Mar 01 '22 01:03 0xJchen

If it's that simple that would be awesome. The reason why I asked is that the pseudocode from the paper "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model" (arXiv:1911.08265) differs from the paper where DeepMind introduced the "MuZero Unplugged" in "Online and Offline Reinforcement Learning by Planning with a Learned Model" (arXiv:2104.06294v1). For example I couldn't find the self.reanalyse_fraction.

tbskrpmnns avatar Mar 01 '22 06:03 tbskrpmnns

kindly try EfficientZero, which also controls the reanalyze part with a fraction argument.

0xJchen avatar Mar 01 '22 06:03 0xJchen

Thanks for the suggestion! In the paper's abstract the authors state that it's a visual based algorithm. RL and in specific offline RL is kind of new for me. That's why I'm wondering whether I could still use the algorithm for my problem, where I don't have a simulator but only a non-image based MDP dataset of (s, a, r, s') rows that I want to use for offline RL.

tbskrpmnns avatar Mar 01 '22 07:03 tbskrpmnns

Purely state-based RL should have been much easier than pixel-based RL. You can still use it except with different inputs.

0xJchen avatar Mar 01 '22 09:03 0xJchen

that makes sense – thanks for you help!

tbskrpmnns avatar Mar 01 '22 11:03 tbskrpmnns

Thanks for the suggestion! In the paper's abstract the authors state that it's a visual based algorithm. RL and in specific offline RL is kind of new for me. That's why I'm wondering whether I could still use the algorithm for my problem, where I don't have a simulator but only a non-image based MDP dataset of (s, a, r, s') rows that I want to use for offline RL.

Actually, I am wondering if someone tried to combine Muzero with COMBO---I think it is the right direction, to overcome offline/policy issue.

dbsxdbsx avatar Oct 04 '22 07:10 dbsxdbsx