d3rlpy [REQUEST] Adding model-based offline RL with image inputs like LOMPO and COMBO

[REQUEST] Adding model-based offline RL with image inputs like LOMPO and COMBO

Open kargarisaac opened this issue 3 years ago • 8 comments

Is your feature request related to a problem? Please describe. Model-based offline RL algorithms which are able to handle image inputs are necessary for some environments.

Describe the solution you'd like Adding an implementation of algorithms like LOMPO and COMBO would be great. In the papers, they mention that these are based on MOPO implementation which is implemented in TensorFlow.

Feb 24 '21 19:02 kargarisaac

@kargarisaac Thanks for your request! I'm also looking at COMBO paper, and I found it looks good. Fortunately, d3rlpy already supports MOPO. So it won't take long to make it. I'll update this issue once d3rlpy supports COMBO.

Feb 25 '21 00:02 takuseno

@kargarisaac Thanks for your request! I'm also looking at COMBO paper, and I found it looks good. Fortunately, d3rlpy already supports MOPO. So it won't take long to make it. I'll update this issue once d3rlpy supports COMBO.

Thank you. I want to work on adding LOMPO too. Is there any template or document for contribution?

Mar 11 '21 15:03 kargarisaac

@kargarisaac Sounds nice! Actually, all we have for contributors now is this document. https://github.com/takuseno/d3rlpy/blob/master/CONTRIBUTING.md

Any kinds of contributions will be appreciated. And, you can freely ask how we implement new algorithms.

Mar 14 '21 15:03 takuseno

Here is a combo implementation. But it doesn't support image inputs. https://agit.ai/Polixir/OfflineRL/src/branch/master

One question regarding adding image support for mopo. Are you working on that? I see a TODO part in the code for that. Do you know any model-based offline rl code that can handle image inputs?

Mar 17 '21 12:03 kargarisaac

@kargarisaac Currently, d3rlpy's MOPO does not support image inputs because there was not a benchmark for that. But, we can make it support image inputs since all algorithms are basically designed independently of observation shape. One tricky part is that we need to automatically determine deconvolution layers at the last of the dynamics model.

Mar 17 '21 12:03 takuseno

@kargarisaac It's very late. But, I've prototyped COMBO for vector observations. https://github.com/takuseno/d3rlpy/blob/master/d3rlpy/algos/combo.py I did not test it yet. But, the implementation should not be far from the paper.

May 08 '21 09:05 takuseno

@takuseno Thank you. sounds great :)

May 09 '21 05:05 kargarisaac

@kargarisaac It's very late. But, I've prototyped COMBO for vector observations. https://github.com/takuseno/d3rlpy/blob/master/d3rlpy/algos/combo.py I did not test it yet. But, the implementation should not be far from the paper.

I test COMBO and get very bad results over medium tasks from d4rl (across 3 random seeds), e.g., the evaluted return gives even negative in some tasks. Could you help me with that?

Jul 20 '21 02:07 dmksjfl

d3rlpy d3rlpy copied to clipboard

[REQUEST] Adding model-based offline RL with image inputs like LOMPO and COMBO

d3rlpy
d3rlpy copied to clipboard