Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Where to contribute datasets

Open iurimatias opened this issue 2 years ago • 3 comments

As part of phase 1, demonstration data is collected. I would like to contribute such data:

  • should it go into this repo or somewhere else?
  • should it be the data itself or only the scripts that generate such data ?

iurimatias avatar Jan 03 '23 01:01 iurimatias

I think just the scripts that generate the data, with the data itself being collected and put in S3 buckets or on HuggingFace or whatever.

We really need an example data collection package within the main repo, with an example script that people can use as a starting point. This could either be as module or a sub-package. The danger of not having this structure is we'll end up with tons of data collection scripts that have to be run manually and have a range of dependencies and manual setup steps, that output data to different places.

I'll add a ticket for that if there isn't one already.

bitplane avatar Jan 03 '23 13:01 bitplane

Added #331

bitplane avatar Jan 03 '23 14:01 bitplane

And closed it in favour of #165 :)

bitplane avatar Jan 03 '23 14:01 bitplane

Working on addressing this in https://github.com/LAION-AI/Open-Assistant/pull/324

lewtun avatar Jan 03 '23 22:01 lewtun