Minari icon indicating copy to clipboard operation
Minari copied to clipboard

HuggingFace Integration

Open balisujohn opened this issue 1 year ago • 1 comments

Description

Draft PR for hugging face Minari integration.

Adds functions to convert back and forth between MinariDataset and datasets.Dataset from Hugging Face datasets. Additionally, it adds functions that allow the user to push and pull datasets from hugging face hub. The core code is ready for review, but there are still a few more features I will add, for which I'm adding a checklist to this description:

I also refactored the tests slightly, creating the new helperful function create_dummy_dataset_with_collecter_env_helper to avoid code repetition.

Additional Features

  • [ ] CLI wrappers for push and pull dataset.
  • [ ] Wrappers to to automatically convert minari datasets when uploading and downloading.
  • [ ] Support for Text Spaces

Checklist:

  • [ ] I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • [ ] I have run pytest -v and no errors are present.
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I solved any possible warnings that pytest -v has generated that are related to my code to the best of my knowledge.
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] New and existing unit tests pass locally with my changes

balisujohn avatar Jun 20 '23 09:06 balisujohn

@RedTachyon One thing that it is likely to happen is that we support the HF data format, and then the conversion functions are no-op. We have MinariStorage that is designed to abstract to the dataset the difference on file formats (see https://github.com/Farama-Foundation/Minari/pull/94#discussion_r1241855457)

Also, if we want to support other libraries as well (e.g. RLDS), we need other conversion functions; I don't think this is interesting for the user.

And we may want to add some loading keywords to load_dataset and then they must be added also to convert_hugging_face_dataset_to_minari_dataset with the same semantics.

younik avatar Jul 04 '23 17:07 younik