Add datasets from data hub awesome collection
Collections
-
Bibliographic data note:
The raw dataset is XML, a good example of adding ingestion for xml formats see https://github.com/weecology/retriever/wiki/GSoC-2020-Project-Ideas#approachList of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
Machine Learning / Statistical
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
List of datapackages created
- [ ] add
-
YAGO List of datapackages created
- [ ] add
Almost all the datasets at Datahub collection already uses its own datapackage.json to define the schema of the data.
https://datahub.io/docs/core-data
https://frictionlessdata.io/data-packages/
So can we not build a module that can parses datapackage.json and restructure it to our format.
I can work on that if this seems a good idea..
@pathak-mayurdeep, thats great. We should add all the datapackages here https://github.com/weecology/retriever/blob/master/scripts/datapackages.yml and then create the module
@pathak-mayurdeep Here's our existing (unmerged) work from a couple of years ago on using existing Frictionless Data data packages: https://github.com/weecology/retriever/pull/980 Feel free to build off this if helpful.
Or maybe this got included in #1010? @henrykironde - #980 suggests that everything went into #1010, but I don't remember us getting all the way to loading external packages. Do you remember the status of that work as of #1010?
We had reached a point where we could ingest some of the data but then specifications changed. Currently we need to create a dictionary for the data types.
Thanks for all the info.. I'll start working on this.
Sorry about the late PR, I was offline for several days due to some personal health issues..
I have used datapackage-py in this. Please let me know whatever changes is needed.