retriever icon indicating copy to clipboard operation
retriever copied to clipboard

Add datasets from data hub awesome collection

Open henrykironde opened this issue 5 years ago • 7 comments

Collections

henrykironde avatar Jan 16 '20 18:01 henrykironde

Almost all the datasets at Datahub collection already uses its own datapackage.json to define the schema of the data. https://datahub.io/docs/core-data https://frictionlessdata.io/data-packages/

So can we not build a module that can parses datapackage.json and restructure it to our format.

I can work on that if this seems a good idea..

mayurdeep avatar Jan 19 '20 12:01 mayurdeep

@pathak-mayurdeep, thats great. We should add all the datapackages here https://github.com/weecology/retriever/blob/master/scripts/datapackages.yml and then create the module

henrykironde avatar Jan 19 '20 14:01 henrykironde

@pathak-mayurdeep Here's our existing (unmerged) work from a couple of years ago on using existing Frictionless Data data packages: https://github.com/weecology/retriever/pull/980 Feel free to build off this if helpful.

ethanwhite avatar Jan 19 '20 17:01 ethanwhite

Or maybe this got included in #1010? @henrykironde - #980 suggests that everything went into #1010, but I don't remember us getting all the way to loading external packages. Do you remember the status of that work as of #1010?

ethanwhite avatar Jan 19 '20 17:01 ethanwhite

We had reached a point where we could ingest some of the data but then specifications changed. Currently we need to create a dictionary for the data types.

henrykironde avatar Jan 19 '20 17:01 henrykironde

Thanks for all the info.. I'll start working on this.

mayurdeep avatar Jan 20 '20 04:01 mayurdeep

Sorry about the late PR, I was offline for several days due to some personal health issues..

I have used datapackage-py in this. Please let me know whatever changes is needed.

mayurdeep avatar Jan 28 '20 20:01 mayurdeep