Metaworld
Metaworld copied to clipboard
Users are confused about goal conditioning
Meta-World was designed to be both a Meta-RL and a Multi-Task RL benchmark.
One of the awkward consequences of that is that the way goal conditioning is handled is very complicated in Meta-World.
Specifically, all environments in Meta-World are goal conditioned, in every benchmark.
However, goals are hidden in Meta-RL, and visible in Multi-Task RL.
This is intended to make "goal inference" part of the Meta-RL objective.
This allows ML1 to be used in a very similar way to older Meta-RL benchmark tasks (like HalfCheetahVelEnv
or Ant Direction).
However, Meta-RL requires that each task be a fully-observable MDP. This requires each "goal" to be considered a different task, and the API reflects this (a ML1
benchmark object contains 50 train task objects, ML10
contains 500 train task objects).
However, Meta-World uses the same API for both Meta-RL and Multi-Task RL. Consequently, using the Benchmark
API, the goal is changed by passing one of the task
objects to the set_task
function.
In particular, many users don't use the Benchmark
API, and don't set the seeded_rand_vec
flag either (which randomizes the goals on reset
using the seed passed to the environment on init).
This leads users to believe the environments are not goal conditioned, even though they definitely are supposed to be (50 goals per task, set by the seed).
I don't know how many inconsistent results have been published because of this confusion, but at least a few.
TL;DR: Meta-RL requires ML10 to have 500 tasks, Multi-Task RL wants MT10 to have 10 tasks with 50 goals. This confuses users.
We should make the documentation and API more clear and harder to mis-use.
A good first start would be renaming the seeded_rand_vec
flag, and setting it to True by default in all of the environment constructors when not using Benchmark
API. Unfortunately, this is a breaking change, and we haven't published any versioned package, so we should definitely make sure we have published at least one version of the package before we do this.