d3rlpy icon indicating copy to clipboard operation
d3rlpy copied to clipboard

[REQUEST] Adding model-based offline RL with image inputs like LOMPO and COMBO

Open kargarisaac opened this issue 3 years ago • 8 comments

Is your feature request related to a problem? Please describe. Model-based offline RL algorithms which are able to handle image inputs are necessary for some environments.

Describe the solution you'd like Adding an implementation of algorithms like LOMPO and COMBO would be great. In the papers, they mention that these are based on MOPO implementation which is implemented in TensorFlow.

kargarisaac avatar Feb 24 '21 19:02 kargarisaac

@kargarisaac Thanks for your request! I'm also looking at COMBO paper, and I found it looks good. Fortunately, d3rlpy already supports MOPO. So it won't take long to make it. I'll update this issue once d3rlpy supports COMBO.

takuseno avatar Feb 25 '21 00:02 takuseno

@kargarisaac Thanks for your request! I'm also looking at COMBO paper, and I found it looks good. Fortunately, d3rlpy already supports MOPO. So it won't take long to make it. I'll update this issue once d3rlpy supports COMBO.

Thank you. I want to work on adding LOMPO too. Is there any template or document for contribution?

kargarisaac avatar Mar 11 '21 15:03 kargarisaac

@kargarisaac Sounds nice! Actually, all we have for contributors now is this document. https://github.com/takuseno/d3rlpy/blob/master/CONTRIBUTING.md

Any kinds of contributions will be appreciated. And, you can freely ask how we implement new algorithms.

takuseno avatar Mar 14 '21 15:03 takuseno

Here is a combo implementation. But it doesn't support image inputs. https://agit.ai/Polixir/OfflineRL/src/branch/master

One question regarding adding image support for mopo. Are you working on that? I see a TODO part in the code for that. Do you know any model-based offline rl code that can handle image inputs?

kargarisaac avatar Mar 17 '21 12:03 kargarisaac

@kargarisaac Currently, d3rlpy's MOPO does not support image inputs because there was not a benchmark for that. But, we can make it support image inputs since all algorithms are basically designed independently of observation shape. One tricky part is that we need to automatically determine deconvolution layers at the last of the dynamics model.

takuseno avatar Mar 17 '21 12:03 takuseno

@kargarisaac It's very late. But, I've prototyped COMBO for vector observations. https://github.com/takuseno/d3rlpy/blob/master/d3rlpy/algos/combo.py I did not test it yet. But, the implementation should not be far from the paper.

takuseno avatar May 08 '21 09:05 takuseno

@takuseno Thank you. sounds great :)

kargarisaac avatar May 09 '21 05:05 kargarisaac

@kargarisaac It's very late. But, I've prototyped COMBO for vector observations. https://github.com/takuseno/d3rlpy/blob/master/d3rlpy/algos/combo.py I did not test it yet. But, the implementation should not be far from the paper.

I test COMBO and get very bad results over medium tasks from d4rl (across 3 random seeds), e.g., the evaluted return gives even negative in some tasks. Could you help me with that?

dmksjfl avatar Jul 20 '21 02:07 dmksjfl