Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Add standard task list to data tickets?

Open bitplane opened this issue 1 year ago • 3 comments

I think we should have a nice set of instructions that we can add to data collection tickets, that we can edit into the first post, link appropriate tickets etc:

edit: updated with feedback from @dctanner

Task list:

[ ] Evaluation of legality / usefulness / scope (discussion in this issue)
[ ] Scraper code written / available (code to collect or scrape raw data from source)
[ ] Raw data set is published (or just a link to the data if it already exists)
[ ] Formatter code pull request (downloads raw data and converts to our format - link pull request here)
[ ] Code run and data published to OpenAssistant Hugging Face

See [datasets docs](https://github.com/LAION-AI/Open-Assistant/blob/main/docs/docs/data/datasets.md) for the process.

https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/about-task-lists

bitplane avatar Feb 26 '23 05:02 bitplane

I think this would be useful. We could also specify that data should be published to HF, and provide a spec for the ideal format we would like data to be published in.

olliestanley avatar Feb 26 '23 12:02 olliestanley

I love this! We should have done this in the first place :)

huu4ontocord avatar Feb 26 '23 16:02 huu4ontocord

I've updated with some feedback. I'll actually do a smallish one and iron out the process before mass-editing people's data issues - might take a few days so more feedback is welcome :)

bitplane avatar Feb 26 '23 19:02 bitplane