framework
framework copied to clipboard
Integration with CKAN as a data portal
Overview
An important step for Frictionless Framework is to provide an ability to read and write packages from different data portals (CKAN/Github/Zenodod/etc) so the users can publish and access their packages easily and using a straightforward API. This issue is for CKAN integration. The implementation is already prototyped in v5 branch.
Specs
Read package
Read package from a dataset that has a datapackage.json/yaml:
package = Package("https://dados.gov.br/dataset/consumo-de-energia-orgaos-publicos")
package = Package.from_ckan(...) # alias
Read package from a repository without a datapackage.json/yaml. We probably need to filter files and add only CSV/XLS(X) to the package. Also CkanControl should have this configurable. We need to map as much as possible metadata provided by CKAN (for now, using frictionless-ckan-mapper):
package = Package("https://demo.ckan.org/dataset/sample-dataset-1")
package = Package("<link>", control=portals.CkanControl(formats=['csv']))
Write package
Publish a package on CKAN (for now, only if the dataset doesn't exist). Also we need to provide an ability to store credentials in ENV/etc. We need to map as much as possible metadata provided by Package:
package.to_ckan()
Read catalog
Read catalog from ckan search. Design some search configurations like limit and offset (pagination).
catalog = Catalog(control=portals.CkanControl(baseurl=<ckan-instance-url>, search="<frictionless>")
for package in catalog.packages:
print(package.name)
Plan
- [ ] prototype the functionality based on the functional requirements
- [ ] get feedback from @roll on the implementation
- [ ] finish the implementation
- [ ] design the testing approach (probably using pytest.vcr fro reading/from what instance? but how to test writing?) (sync with Adria on the best practices of testing against CKAN instances)
- [ ] write a great deal of tests to be sure that the integration works correctly
- [ ] write a comprehensive tutorial - https://framework.frictionlessdata.io/docs/tutorials/tutorials-overview (new section Portals Tutorials)
@roll with the new work on https://github.com/frictionlessdata/frictionless-ckan-mapper this should be easier.