frictionless-r icon indicating copy to clipboard operation
frictionless-r copied to clipboard

Detect Data Package version with `version()` function

Open peterdesmet opened this issue 1 year ago • 5 comments
trafficstars

To support Data Package v2 we need to be able to detect the version used by a package.

  • [ ] $schema is undefined
    • version = 1.0
    • We theoretically should look at the profile property in this case (see backward compatibility note) but frictionless-r ignores this property since it doesn't use it (it is useful for validation etc.).
  • [ ] $schema = https://datapackage.org/profiles/1.0/datapackage.json
    • version = 1.0
    • profile is ignored (since new property $schema is used)
  • [ ] $schema = https://datapackage.org/profiles/2.0/datapackage.json
    • version = 2.0
    • profile is ignored (deprecated in 2.0)
  • [ ] $schema = https://datapackage.org/profiles/2.1-rc.1/datapackage.json
    • version = 2.1-rc.1 (theoretical example)
    • profile is ignored
  • [ ] $schema = https://fiscal.datapackage.org/profiles/fiscal-data-package.json, https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.1/camtrap-dp-profile.json or any other value
    • version = >=2.0: we can't detect the version, but it is higher or equal to 2.0 since $schema is used.
    • profile is ignored

Even if we would read profile, the end result would still be version = 1.0

  • [x] profile is undefined
    • profile is assumed to be data-package (see https://specs.frictionlessdata.io/profiles/#introduction)
    • version = 1.0 (implied by profile use)
  • [x] profile = data-package
    • version = 1.0 (implied by profile use)
  • [x] profile= tabular-data-package, fiscal-data-package, https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.1/camtrap-dp-profile.json or any other value
    • version = 1.0 (implied by profile use)

In my opinion, the best way to implement this is with a version(package) function. This allows us in the future to create a version()<- function. Alternative names:

  • get_version(): this limits us in the future from having a version(). I'm tempted to rename all functions that start with get_
  • package_version(): the version logic is the same for package, resource, dialect, schema: it's just the name of the file in the URL that is different (datapackage.json, dataresource.json). I therefore think we can make one function for all of these, rather than four functions.

I think we can generalize to a version(list) function:

  • [ ] Use the logic above for any incoming list (JSON). Search for $schema and get the version from the URL if it starts with https://datapackage.org, otherwise use 1.0 (if undefined $schema) or >2.0.
  • [ ] Setting the version is a bit harder, since - especially for a value like https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.1/camtrap-dp-profile.json if it's a package, resource, etc. I think this can be solved with an extra argument in the set function: version(level = "schema") <- 2.0 would assign https://datapackage.org/profiles/2.0/tableschema.json

peterdesmet avatar Aug 29 '24 16:08 peterdesmet