Open-Assistant
Open-Assistant copied to clipboard
Add standard task list to data tickets?
I think we should have a nice set of instructions that we can add to data collection tickets, that we can edit into the first post, link appropriate tickets etc:
edit: updated with feedback from @dctanner
Task list:
[ ] Evaluation of legality / usefulness / scope (discussion in this issue)
[ ] Scraper code written / available (code to collect or scrape raw data from source)
[ ] Raw data set is published (or just a link to the data if it already exists)
[ ] Formatter code pull request (downloads raw data and converts to our format - link pull request here)
[ ] Code run and data published to OpenAssistant Hugging Face
See [datasets docs](https://github.com/LAION-AI/Open-Assistant/blob/main/docs/docs/data/datasets.md) for the process.
https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/about-task-lists
I think this would be useful. We could also specify that data should be published to HF, and provide a spec for the ideal format we would like data to be published in.
I love this! We should have done this in the first place :)
I've updated with some feedback. I'll actually do a smallish one and iron out the process before mass-editing people's data issues - might take a few days so more feedback is welcome :)