NEKO icon indicating copy to clipboard operation
NEKO copied to clipboard

Source MiniGrid/BabyAI Dataset

Open daniellawson9999 opened this issue 1 year ago • 8 comments

Background

BabyAI is a "gridworld environment whose levels consist of instruction-following tasks that are described by a synthetic language". Gato generates their dataset using the built-in BabyAI bot, with more details that can be found in the paper.

The original repo is now being maintained under Farama as well as MiniGrid. In the 2023 update of the BabyAI repo, it discusses this change and also says:

"This repository still contains scripts which, if adapted to the Minigrid library, could be used to:

  • Produce demonstrations using the BabyAI bot,
  • Train Imitation Learning agents using the bot-generated demonstrations as training trajectories."

More info regarding minigrid can be found here: https://minigrid.farama.org/. There are both the original BabyAI environments and MiniGrid environments provided.

Tasks

As in issue https://github.com/ManifoldRG/NEKO/issues/13, requirement (1) is that environmets meet the Gymnasium API, this is already accomplished, as the Minigrid repo follows the new API.

The uncompleted task is sourcing a dataset, and porting it to Minari, requirement (2). There are several paths to sourcing a dataset:

  1. Collect dataset manually, using the BabyAI bot, which may have to be adapted to meet the new Minigrid repo https://github.com/mila-iqia/babyai/blob/master/babyai/bot.py .

  2. See if papers using Minigrid/BabyAI provide datasets, some papers can be found here: https://minigrid.farama.org/content/publications/ . In this case, a dataset just needs to be converted to Minari.

  3. Collaborate with Minari on sourcing the dataset. In this repo, it says that more datasets are to come to Minari https://github.com/rodrigodelazcano/d4rl-minari-dataset-generation. At the end of the reame, it includes Minigrid. Potentially reach out to https://github.com/rodrigodelazcano, or others at Minari, discord can be found here: https://farama.org/

If interested, please add yourself to this issue, and discuss which path you are pursuing.

Output

The output should be a link to a GitHub repo that provides a process for acquiring the dataset as in https://github.com/daniellawson9999/data-tests.

daniellawson9999 avatar May 28 '23 19:05 daniellawson9999