opendataeditor icon indicating copy to clipboard operation
opendataeditor copied to clipboard

Publish to CKAN Feature [Research/Discussion]

Open pdelboca opened this issue 1 year ago • 1 comments

Publishing to CKAN: Main Challenge

Publication to CKAN can be tricky since it is directly related with the schema that each CKAN instance implements for their datasets and resources. As an example, here is the schema that opendata.swiss implements: https://ckan.opendata.swiss/api/3/action/scheming_dataset_schema_show?type=dataset.

In order to be able to publish, we need to known what CKAN expects. If CKAN does not provide that information, we cannot publish, so **requirement number 1: ** we should be able to access to the particular CKAN instance schema or assume it is a vanilla implementation.

Since the goal of Open Data Editor is to be a tool for non-technical users, to properly implement this feature we need:

  • The CKAN instance to expose the schema (Like the opendata.swiss example)
  • With that schema, we should be able to generate a form for the user to fill all the field that the instance of CKAN expects.
  • Once the form is filled, we should be able to publish to CKAN and properly report back errors.

We need better definitions

We need a proper definition of what does it mean to "Publish to CKAN":

  • Are we publishing datasets to CKAN?
  • Are we uploading a specific file as a new resource to an already created dataset?
  • Are we just re-uploading a specific file to an already existing CKAN resource?
  • Are we mapping datapackages to CKAN packages?
  • All of them?
  • Are we thinking on a UI form that simulates a CKAN upload form or are we thinking on a low-level API interaction for more technical users?

We will always have the schema issue, but going for more simpler scopes (like just replacing a File in an already created CKAN dataset.)

Current Implementation gaps

The current implementation is based on https://github.com/frictionlessdata/frictionless-ckan-mapper which provides a set of hard coded fields to map between a vanilla CKAN instance and Frictionless. The current Frictionless Portals Documentation does not provide any mention of how to handle custom schemas so I'm assuming that it is not implemented (Maybe @roll can provide some context here?)

Even if it is implemented at the core of Frictionless, we will need to still work on the UI that will power the feature. (Or work on a UI that works for technical users only)

Exposing the schema

The most widely used extension in CKAN to customize the schema is ckanext-scheming, however not all instances expose the endpoint to show the schema like Open Data Swiss does. Without the information of what CKAN expects it is not possible to define a UI form. We might be able to play around only with fields that we know for sure CKAN expects, mostly if our goal is a feature to just update a CKAN resource.

Dynamic Form

Building a dynamic form even when it is completely feasible, it is not an easy task. The good thing is that the fields that ckanext-scheming exposes are limited in number so we have limited implementations. There are some tools like https://uniforms.tools/docs/what-are-uniforms/ that creates forms from schemas (even using MUI!), but I'm not sure how flexible they are to create forms on-the-fly based on what ckanext-scheming returns.

It is worth to point that ckanext-scheming does not return the type of the data, but rather what snippet should be used to render the form. So it will not provide information of how it is stored or handled.

ckanext-scheming provides a list of validators but I assume it will be easier to not double-implement front end validation.

pdelboca avatar Jun 24 '24 09:06 pdelboca

Hey @pdelboca. If I understood correctly, publication in CKAN through the ODE seems more complicated than expected because the process is connected to the specific CKAN instance. Also, the best solution would be to work on the publication feature in stages, right? Can we start by publishing a file as a new resource when the dataset already exists?

romicolman avatar Jun 24 '24 12:06 romicolman

I have been doing a small research and beta implementation to explore difficulties and challenges.

Image

The work done can be seen at: #1075 .

Considerations:

  • Only resource level (no CKAN datasets)
  • Only file uploads: CKAN supports both URLs and files as a resource. Since ODE works with files we are hard limiting to only uploads.
  • Will work with ckanext-scheming or vanilla CKAN schemas.
  • It will not work with heavily customized CKAN Resources.

Observations:

  • Not having a fixed schema for all resources requires some kind of dynamic form that could be built based on the metadata that CKAN instances exposes (which ckanext-scheming does).
  • Multilingual CKAN intances can return fields and metadata in several languages. It's label can become a dictionary and handling it poses a complexity. (Matching with ODE Language? Dynamic language selector form? Just use field_name?)
  • Detailed error feedback will require some work (red inputs, error messages, etc)
  • Select fields not always have it's options in the schema. Sometimes CKAN relies on choices_helpers that fetch the options from the backend and that data is not available at the JSON Schema. This is a major issue since ODE users will have no guideline on what to do with that field
  • Some CKAN Resources UIs have autocomplete, front-end validation and user UX that would be complex or impossible to support.
  • CKAN Instances with front-end validation would not be able to be supported (because we do not have any feedback on the errors)
  • For some fields like hash and format CKAN provides a backend guesser/calculator that might lead to inconsistent data in some scenarios, so this will require some further testing/research.

Final thoughts

  • A possibility for ODE could be to have a form with only the Required Fields that allow users to create the resource and then complete the process on the platform.
  • For simple CKAN instances, the feature is totally feasible with the caveat that select fields will have to became text in ODE if no choices are provided on the resource schema.

pdelboca avatar Dec 19 '25 15:12 pdelboca