ckanext-datapackager icon indicating copy to clipboard operation
ckanext-datapackager copied to clipboard

Incorrect properties in datapackage.json download

Open Stephen-Gates opened this issue 8 years ago • 3 comments

We are testing v1.0.0 of this extension on CKAN version 2.4.

I have:

CKAN Maintainer is returned as Data Package Author

Returned:

  "author": {
    "name": "[email protected]", 
    "email": "[email protected]"
   }

This doesn't follow the specification. I suspect this should follow data package contributors, e.g.

"contributors": [{
    "title": "ckan_maintainer",
    "email": "ckan_maintainer_email",
    "path": "ckan_organization_url",
    "role": "maintainer"
  }]
}]

Note: title is a mandatory property.

Not 100% sure on setting the the contributors path value. Should this be:

  • the url to the organisation in ckan, e.g. https://www.data.gov.au/organization/about/australianbureauofstatistics, or
  • the url that points to the organisation's home page?

image

CKAN Author is returned as Data Package Sources

Returned:

 "sources": [
    {
      "name": "[email protected]", 
      "email": "[email protected]"
    }
  ]

I suspect this should follow data package sources, e.g.

"sources": [{
  "title": "ckan_author",
  "email": "ckan_author_email",
  "path": "ckan_dataset_url?"
}]

Note: title is a mandatory property.

The spec states that sources is an an array of the raw sources for this data package, so I'm not sure if this property is being used correctly (see note on forum and #30 in datahub.io ). Perhaps it would make more sense to create another contributor role for "author" and not output sources.

Code to fix is in another repo https://github.com/frictionlessdata/ckan-datapackage-tools/blob/master/ckan_datapackage_tools/converter.py

Stephen-Gates avatar Nov 29 '17 04:11 Stephen-Gates

Implementation

@Stephen-Gates these fields are really loosely defined and can be interpreted in many ways. I'm happy to agree on a convention for ckanext-datapackager, and what you propose is really sensible (CKAN maintainer -> DP contributor with role maintainer, CKAN author -> DP contributor with role author). Some comments though:

  • CKAN author name and maintainer name are not mandatory. The specs require title (wrongly IMO) so what happens if there is only the email provided in CKAN?
  • The specs allows for other role values (eg wrangler (!)). When importing a DP I would not map these to neither author or maintainer, so I guess we should save the whole contributors object as an extra and let instances decide what to do with them.
  • In pretty much all cases I know of, the maintainer fields and the organization a dataset belongs to are not linked, so I'm not sure about using the org url (either the ckan page or the org website) for path
  • Regarding your suggestion to dump CKAN author in sources, I'm not too convinced. Again, author can mean different things to different people. The note in the specs seems to suggest that these should be used to point to the original dataset.

So, my proposal:

  • When ingesting DP:
    • First DP contributor with role maintainer --> CKAN maintainer, First DP contributor with role author --> CKAN author
    • Store the whole contributors object as extra
  • When exporting to DP:
    • If there is a contributors extra use that
    • Alternatively, if there are CKAN maintainer or author fields use these for contributors with the appropiate role

Estimate

1 day

amercader avatar Feb 08 '18 20:02 amercader

When ingesting DP: -First DP contributor with role maintainer --> CKAN maintainer, First DP contributor with role author --> CKAN author Store the whole contributors object as extra

agreed

When exporting to DP:

  • If there is a contributors extra use that
  • Alternatively, if there are CKAN maintainer or author fields use these for contributors with the appropriate role

agreed

My principle is, what goes up, must come down. ☂️

Stephen-Gates avatar Feb 08 '18 21:02 Stephen-Gates

Note: check that author present does not return an error as mentioned in #51 and it's just ignored

amercader avatar Feb 09 '18 09:02 amercader