datastorr icon indicating copy to clipboard operation
datastorr copied to clipboard

Integrate with OKFN data packages

Open richfitz opened this issue 8 years ago • 7 comments

Add a packages.json file that contains metadata information probably satisfies most of the requirements.

richfitz avatar Dec 18 '15 16:12 richfitz

Here's the website with a bit more information http://data.okfn.org/doc/data-package

Importantly, this can be additional to what we currently have and allow better interopability. I don't believe there is good R tooling for dealing with datapackages yet though.

richfitz avatar Jan 08 '16 10:01 richfitz

So like for taxonlookup we take what's now in the github readme.md and put it in a .json file? I guess ideally it would also be in the R documentation also? I guess we need a system where the meta-data in one place (the json file?) is canonical and the other 2 are generated?

wcornwell avatar Jan 09 '16 06:01 wcornwell

I really like the idea of the OKFN data packages, so in principle it would be great to support them. Depends how much work it is. Seems low cost.

Generating the readme from a single canonical source for metadata shouldn't be too hard. I tried something like this a while back, where i used a json file with metadata to write the readme. (see readme.Rmd in github.com/dfalster/Falster_2005_JEcol_data). Now I know there is now an actual preferred for that metadata.

dfalster avatar Jan 12 '16 06:01 dfalster

Yeah, this is not too much work now that I have the automatic uploading thing worked out. We'd just hook into the same set of routines.

I think I'd opt to put the json in with the releases themselves, and have the URIs in the release json resolve to the github release URIs. So for taxonlookup it would read:

{
  "name" : "traitecoevo/taxonlookup",
  "title" : "A dynamically-updating versioned taxonomic resource for vascular plants",
  "license" : "CC0",
  "sources" : [{
    "name": "The plant list",
    "web": "http://www.theplantlist.org"
  }],
  "author": "Will Cornwell <[email protected]>",
  "contributors": [
    "Will Cornwell <[email protected]>",
    "Rich FitzJohn <[email protected]>",
    "Matt Pennell <[email protected]>"
  ],
  "version": "1.0.2",
  "resources": [{
    "url": "https://github.com/traitecoevo/taxonlookup/releases/download/v1.0.2/plant_lookup.csv",
    "name": "plant_lookup",
    "format": "csv",
    "hash": "sha1:cf6bb45eed09973d599e97fa8a6b8234c084e52a"
  }]
}

as you can see most of that is gettable from the DESCRIPTION file, so that's easy enough.

richfitz avatar Jan 12 '16 09:01 richfitz

Looks good.

On Tue, Jan 12, 2016 at 8:20 PM, Rich FitzJohn [email protected] wrote:

Yeah, this is not too much work now that I have the automatic uploading thing worked out. We'd just hook into the same set of routines.

I think I'd opt to put the json in with the releases themselves, and have the URIs in the release json resolve to the github release URIs. So for taxonlookup it would read:

{ "name" : "traitecoevo/taxonlookup", "title" : "A dynamically-updating versioned taxonomic resource for vascular plants", "license" : "CC0", "sources" : [{ "name": "The plant list", "web": "http://www.theplantlist.org" }], "author": "Will Cornwell [email protected]", "contributors": [ "Will Cornwell [email protected]", "Rich FitzJohn [email protected]", "Matt Pennell [email protected] [aut]" ], "version": "1.0.2", "resources": [{ "url": "https://github.com/traitecoevo/taxonlookup/releases/download/v1.0.2/plant_lookup.csv", "name": "plant_lookup", "format": "csv", "hash": "sha1:cf6bb45eed09973d599e97fa8a6b8234c084e52a" }] }

as you can see most of that is gettable from the DESCRIPTION file, so that's easy enough.

— Reply to this email directly or view it on GitHub https://github.com/richfitz/datastorr/issues/2#issuecomment-170848174.

dfalster avatar Jan 12 '16 09:01 dfalster

I agree, the specific meta-data for the columns might take a bit of organizing...

BTW, I like the new datastorr release feature. Worked the first time.

wcornwell avatar Jan 13 '16 03:01 wcornwell

The column specific meta-data is someone else's problem, I think. Not all the data stored this way will be tabular, in any case. So as long as there's a facility for including it (most trivially a json file somewhere in the repo that would get slurped in).

richfitz avatar Jan 13 '16 09:01 richfitz