Open-Assistant
Open-Assistant copied to clipboard
Where to contribute datasets
As part of phase 1, demonstration data is collected. I would like to contribute such data:
- should it go into this repo or somewhere else?
- should it be the data itself or only the scripts that generate such data ?
I think just the scripts that generate the data, with the data itself being collected and put in S3 buckets or on HuggingFace or whatever.
We really need an example data collection package within the main repo, with an example script that people can use as a starting point. This could either be as module or a sub-package. The danger of not having this structure is we'll end up with tons of data collection scripts that have to be run manually and have a range of dependencies and manual setup steps, that output data to different places.
I'll add a ticket for that if there isn't one already.
Added #331
And closed it in favour of #165 :)
Working on addressing this in https://github.com/LAION-AI/Open-Assistant/pull/324