psData icon indicating copy to clipboard operation
psData copied to clipboard

Recipes

Open vincentarelbundock opened this issue 12 years ago • 5 comments

In terms of repository structure, I think it would be beneficial to split each data source into separate files. The idea would be to create a standarized "recipe" format that would include all info about the dataset (e.g. where to download, bibtex cite, name of cleaning script, date updated), and then a cleaning script that does all the magic we need.

I use something like that locally, where I have a YAML file that specifies all the info and then an accompanying python script that I use for cleaning.

This makes user contributions very easy. They just cut and paste another "recipe" and include an R script that does the cleaning. The only thing psData has to do is provide a proper API to parse the recipe, download the data, and activate the cleaning script.

Think of something like the homebrew install for mac and its library of "formulas":

https://github.com/Homebrew/homebrew/tree/master/Library/Formula

vincentarelbundock avatar Mar 05 '14 13:03 vincentarelbundock

Copied over from https://github.com/rOpenGov/psData/issues/8

You would have 2 files:

database_political_institutions.yaml (download url, bibtex cite, etc.) database_political_institutions.R (cleaning script with all transformations) And a standardized function:

get_data(): Parse .yaml file, download data if not already cached, and run R script. If flagged for caching, then copy yaml, raw data, processed data and R script to specified path.

vincentarelbundock avatar Mar 05 '14 13:03 vincentarelbundock

If we get something like that, I would almost certainly contribute recipes.

vincentarelbundock avatar Mar 05 '14 13:03 vincentarelbundock

Just had a pie in the sky thought: It would be interesting if we could create a really simple website where someone who hosted a data set could fill out a form with specific metadata and information on how to download the data set.

On submission of the web form the recipe would be generated and a pull request would be initiated.

This would make it really easy to contribute new recipes.

christophergandrud avatar Apr 17 '14 15:04 christophergandrud

If someone has the chance to work on the implementation, we could probably arrange server space with rOpenGov.

antagomir avatar Apr 17 '14 17:04 antagomir

Sounds good. This can be something to work on in #12.

christophergandrud avatar Apr 18 '14 05:04 christophergandrud