framework icon indicating copy to clipboard operation
framework copied to clipboard

Integration with CKAN as a data portal

Open roll opened this issue 5 years ago • 1 comments

Overview

An important step for Frictionless Framework is to provide an ability to read and write packages from different data portals (CKAN/Github/Zenodod/etc) so the users can publish and access their packages easily and using a straightforward API. This issue is for CKAN integration. The implementation is already prototyped in v5 branch.

Specs

Read package

Read package from a dataset that has a datapackage.json/yaml:

package = Package("https://dados.gov.br/dataset/consumo-de-energia-orgaos-publicos")
package = Package.from_ckan(...) # alias

Read package from a repository without a datapackage.json/yaml. We probably need to filter files and add only CSV/XLS(X) to the package. Also CkanControl should have this configurable. We need to map as much as possible metadata provided by CKAN (for now, using frictionless-ckan-mapper):

package = Package("https://demo.ckan.org/dataset/sample-dataset-1")
package = Package("<link>", control=portals.CkanControl(formats=['csv']))

Write package

Publish a package on CKAN (for now, only if the dataset doesn't exist). Also we need to provide an ability to store credentials in ENV/etc. We need to map as much as possible metadata provided by Package:

package.to_ckan()

Read catalog

Read catalog from ckan search. Design some search configurations like limit and offset (pagination).

catalog = Catalog(control=portals.CkanControl(baseurl=<ckan-instance-url>, search="<frictionless>")
for package in catalog.packages:
  print(package.name)

Plan

  • [ ] prototype the functionality based on the functional requirements
  • [ ] get feedback from @roll on the implementation
  • [ ] finish the implementation
  • [ ] design the testing approach (probably using pytest.vcr fro reading/from what instance? but how to test writing?) (sync with Adria on the best practices of testing against CKAN instances)
  • [ ] write a great deal of tests to be sure that the integration works correctly
  • [ ] write a comprehensive tutorial - https://framework.frictionlessdata.io/docs/tutorials/tutorials-overview (new section Portals Tutorials)

roll avatar Oct 13 '20 05:10 roll

@roll with the new work on https://github.com/frictionlessdata/frictionless-ckan-mapper this should be easier.

rufuspollock avatar Nov 05 '20 10:11 rufuspollock