MAML performance on 2D navigation
Thank you for the clean and well-documented library! I am trying to use MAML for 2D navigation but have been achieving suboptimal policies. In particular, rollouts from the (adapted) trained algorithm look predominantly something like this.

I am using the current master version (https://github.com/rlworkgroup/garage/commit/43e7b78f4c3a8e977c8140e5e56c8c793f15263c).
This gist contains a minimal example that should reproduce this behavior. I have tried to keep hyperparameters to the defaults, wherever these are given. Additionally, this is the debug log from running the above minimal example.
Apologies for the vague issue - I am not sure whether this behavior is due to hyperparameters or an issue somewhere in the meta-RL pipeline. Thank you in advance!
As an additional comment, I have made some small changes to envs/point_env.py:
26 - goal=np.array((1., 1.), dtype=np.float32),
26 + goal=np.array((0.3, 0.3), dtype=np.float32),
27 - arena_size=5.,
27 + arena_size=0.5,
198 - goals = np.random.uniform(-2, 2, size=(num_tasks, 2))
198 + goals = np.random.uniform(-self._arena_size, self._arena_size, size=(num_tasks, 2))
These change the default arena to a unit square and the default support for choosing goals for tasks. I am happy to submit a pull request for that if you believe these changes make sense.
@kristian-georgiev I think the issue is that the maml in garage uses a Mlp baseline, that the algorithm tries to relearn on every training epoch.
Traditionally MAML fits a linear feature baseline for this variance reduction, and it can fit quite a bit faster to new data than a nn baseline.
If you check out the branch avnish-new-metaworld-results, then you'll find a linear baseline that I've hacked together for torch, and modifications that I've made to MAML to make it work with that linear baseline.
Thanks, Avnish.