datastorr
datastorr copied to clipboard
Integrate with OKFN data packages
Add a packages.json file that contains metadata information probably satisfies most of the requirements.
Here's the website with a bit more information http://data.okfn.org/doc/data-package
Importantly, this can be additional to what we currently have and allow better interopability. I don't believe there is good R tooling for dealing with datapackages yet though.
So like for taxonlookup
we take what's now in the github readme.md and put it in a .json file? I guess ideally it would also be in the R documentation also? I guess we need a system where the meta-data in one place (the json file?) is canonical and the other 2 are generated?
I really like the idea of the OKFN data packages, so in principle it would be great to support them. Depends how much work it is. Seems low cost.
Generating the readme from a single canonical source for metadata shouldn't be too hard. I tried something like this a while back, where i used a json file with metadata to write the readme. (see readme.Rmd in github.com/dfalster/Falster_2005_JEcol_data). Now I know there is now an actual preferred for that metadata.
Yeah, this is not too much work now that I have the automatic uploading thing worked out. We'd just hook into the same set of routines.
I think I'd opt to put the json in with the releases themselves, and have the URIs in the release json resolve to the github release URIs. So for taxonlookup it would read:
{
"name" : "traitecoevo/taxonlookup",
"title" : "A dynamically-updating versioned taxonomic resource for vascular plants",
"license" : "CC0",
"sources" : [{
"name": "The plant list",
"web": "http://www.theplantlist.org"
}],
"author": "Will Cornwell <[email protected]>",
"contributors": [
"Will Cornwell <[email protected]>",
"Rich FitzJohn <[email protected]>",
"Matt Pennell <[email protected]>"
],
"version": "1.0.2",
"resources": [{
"url": "https://github.com/traitecoevo/taxonlookup/releases/download/v1.0.2/plant_lookup.csv",
"name": "plant_lookup",
"format": "csv",
"hash": "sha1:cf6bb45eed09973d599e97fa8a6b8234c084e52a"
}]
}
as you can see most of that is gettable from the DESCRIPTION file, so that's easy enough.
Looks good.
On Tue, Jan 12, 2016 at 8:20 PM, Rich FitzJohn [email protected] wrote:
Yeah, this is not too much work now that I have the automatic uploading thing worked out. We'd just hook into the same set of routines.
I think I'd opt to put the json in with the releases themselves, and have the URIs in the release json resolve to the github release URIs. So for taxonlookup it would read:
{ "name" : "traitecoevo/taxonlookup", "title" : "A dynamically-updating versioned taxonomic resource for vascular plants", "license" : "CC0", "sources" : [{ "name": "The plant list", "web": "http://www.theplantlist.org" }], "author": "Will Cornwell [email protected]", "contributors": [ "Will Cornwell [email protected]", "Rich FitzJohn [email protected]", "Matt Pennell [email protected] [aut]" ], "version": "1.0.2", "resources": [{ "url": "https://github.com/traitecoevo/taxonlookup/releases/download/v1.0.2/plant_lookup.csv", "name": "plant_lookup", "format": "csv", "hash": "sha1:cf6bb45eed09973d599e97fa8a6b8234c084e52a" }] }
as you can see most of that is gettable from the DESCRIPTION file, so that's easy enough.
— Reply to this email directly or view it on GitHub https://github.com/richfitz/datastorr/issues/2#issuecomment-170848174.
I agree, the specific meta-data for the columns might take a bit of organizing...
BTW, I like the new datastorr release feature. Worked the first time.
The column specific meta-data is someone else's problem, I think. Not all the data stored this way will be tabular, in any case. So as long as there's a facility for including it (most trivially a json file somewhere in the repo that would get slurped in).