ckanext-datapackager
ckanext-datapackager copied to clipboard
Incorrect properties in datapackage.json download
We are testing v1.0.0 of this extension on CKAN version 2.4.
I have:
- uploaded a zip file
- which is published here
- when I download the datapackage.json it contains incorrect properties:
CKAN Maintainer is returned as Data Package Author
Returned:
"author": {
"name": "[email protected]",
"email": "[email protected]"
}
This doesn't follow the specification. I suspect this should follow data package contributors, e.g.
"contributors": [{
"title": "ckan_maintainer",
"email": "ckan_maintainer_email",
"path": "ckan_organization_url",
"role": "maintainer"
}]
}]
Note: title is a mandatory property.
Not 100% sure on setting the the contributors path value. Should this be:
- the url to the organisation in ckan, e.g. https://www.data.gov.au/organization/about/australianbureauofstatistics, or
- the url that points to the organisation's home page?

CKAN Author is returned as Data Package Sources
Returned:
"sources": [
{
"name": "[email protected]",
"email": "[email protected]"
}
]
I suspect this should follow data package sources, e.g.
"sources": [{
"title": "ckan_author",
"email": "ckan_author_email",
"path": "ckan_dataset_url?"
}]
Note: title is a mandatory property.
The spec states that sources is an an array of the raw sources for this data package, so I'm not sure if this property is being used correctly (see note on forum and #30 in datahub.io ). Perhaps it would make more sense to create another contributor role for "author" and not output sources.
Code to fix is in another repo https://github.com/frictionlessdata/ckan-datapackage-tools/blob/master/ckan_datapackage_tools/converter.py
Implementation
@Stephen-Gates these fields are really loosely defined and can be interpreted in many ways. I'm happy to agree on a convention for ckanext-datapackager, and what you propose is really sensible (CKAN maintainer -> DP contributor with role maintainer, CKAN author -> DP contributor with role author). Some comments though:
- CKAN author name and maintainer name are not mandatory. The specs require
title(wrongly IMO) so what happens if there is only the email provided in CKAN? - The specs allows for other
rolevalues (egwrangler(!)). When importing a DP I would not map these to neither author or maintainer, so I guess we should save the wholecontributorsobject as an extra and let instances decide what to do with them. - In pretty much all cases I know of, the maintainer fields and the organization a dataset belongs to are not linked, so I'm not sure about using the org url (either the ckan page or the org website) for
path - Regarding your suggestion to dump CKAN author in
sources, I'm not too convinced. Again, author can mean different things to different people. The note in the specs seems to suggest that these should be used to point to the original dataset.
So, my proposal:
- When ingesting DP:
- First DP contributor with role maintainer --> CKAN maintainer, First DP contributor with role author --> CKAN author
- Store the whole
contributorsobject as extra
- When exporting to DP:
- If there is a
contributorsextra use that - Alternatively, if there are CKAN maintainer or author fields use these for contributors with the appropiate role
- If there is a
Estimate
1 day
When ingesting DP: -First DP contributor with role maintainer --> CKAN maintainer, First DP contributor with role author --> CKAN author Store the whole contributors object as extra
agreed
When exporting to DP:
- If there is a contributors extra use that
- Alternatively, if there are CKAN maintainer or author fields use these for contributors with the appropriate role
agreed
My principle is, what goes up, must come down. ☂️
Note: check that author present does not return an error as mentioned in #51 and it's just ignored