ckanext-datapackager
ckanext-datapackager copied to clipboard
Provide more guidance when importing Data Packages
In general, the process of importing Data Packages is confusing and prone to errors, specially if you are not familiar with Data Packages. Apart from the obvious bugs (#37, #39) and the confusing issues (#38) I'd say that most people will struggle to import their Data Packages on the first time, unless:
- They upload/link to a
datapackage.json
withurl
on their resources - They upload/link to a zip file including a
datapackage.json
withpath
on their resources and the data files.
If these are the two situations that we can support fair enough, but maybe we should provide a helper text explaining what users should upload.
I commented on https://github.com/ckan/ckanext-datapackager/issues/38 about the first issue of a datapackage.json
without url
on the resources. The issue wasn't that the resources didn't have a URL, but that they only had a path
that didn't exist.
As the code is now, I would say that the users will be able to import a good data package without issues (https://github.com/ckan/ckanext-datapackager/issues/39 was fixed). The users will struggle if the data packages aren't valid, or any of its resources' data aren't available, as the error messages are somewhat confusing.
The current "happy path" is:
-
datapackage.json
with all resources' data remotely accessible (no inline or local resources); - ZIP file with
datapackage.json
at the root level;
That seems sensible for a first version IMO. We might also want to add support for ZIP files with the datapackages inside a single folder (https://github.com/okfn/datapackage-py/issues/29), like the GitHub exports, but it's not essential.
It would be useful to think about some common use cases with invalid datapackages (invalid for importing on CKAN), so we can make sure we handle them well.
I can think of:
- datapackage.json with resources with only local data;
- datapackage.json with resources with only inline data;
- datapackage.json with resources without data;
- Invalid datapackage (e.g. without a "name" attribute);
- Invalid datapackage JSON file;
- ZIP file without datapackage.json in the root folder.
What else?
@vitorbaptista
cc @amercader
Can you please distill this into a task list that could be actionable?
Suggest the action is to improve the readme importing section to describe what are valid and invalid data packages to upload.
This could be complemented with valid data package examples.
Implementation
- [ ] Extend the Import section in the README with a description of the formats supported
- [x] Add a helper text in the form that suggests what to upload / link to (zipped data package or link to
datapackage.json
and what is supported (Done in c2bd8a6)
Estimate
0.5 days