frictionless-r
frictionless-r copied to clipboard
How to convert a v1 to a v2 package
I wanted to assess the complexity of converting a v1 to a v2 Data Package. Below are the steps that need to be taken. For version detection, see #262. @khusmann could you review these? There are a couple of items I'm unsure about.
Package
Add package.$schema, remove package.profile
Use package.profile, then remove it.
- [ ]
NULL=>https://datapackage.org/profiles/2.0/datapackage.json - [ ]
data-package(registered id) =>https://datapackage.org/profiles/2.0/datapackage.json - [ ]
tabular-data-package(registered id) =>https://datapackage.org/profiles/2.0/datapackage.json. This also removes deprecated tabular-data-package - [ ]
fiscal-data-package(registered id) => Unsure, should we use the 1.0 URL for fiscal-data-package? - [ ] A URL => Unsure, the referenced schema will likely point to Data Package v1, making it a v1
- [ ] Any other value => Unsure, not allowed by https://specs.frictionlessdata.io/profiles/
Add package.contributors.roles
- [ ] For each contributor set
roles(array) based onrole(string). Removerole
Other changes
- [x] package.version: documentation change, no action required
- [x] package.contributors: no action required for
title,givenNameandfamilyName. - [ ] package.sources: Unsure, but I think no action is required
Each resource
Add resource.$schema, remove resource.profile
Use resource.profile, then remove it
- [ ]
NULL=>https://datapackage.org/profiles/2.0/dataresource.json - [ ]
data-resource(registered id) =>https://datapackage.org/profiles/2.0/dataresource.json - [ ]
tabular-data-resource(registered id) =>https://datapackage.org/profiles/2.0/dataresource.json(but seeresource.type) - [ ] A URL => Unsure, the referenced schema will likely point to Data Package v1, making it a v1
- [ ] Any other value => Unsure, not allowed by https://specs.frictionlessdata.io/profiles/
- [ ] There is also the edge case where
$schemais already present (i.e. a v1 package with a v2 resource). => Unsure, should the presentresource.$schemabe left as is then?
Add resource.type
Use resource.profile:
- [ ]
NULL=> don't set - [ ]
tabular-data-resource=>table - [ ] Any other value or URL => don't set
Other changes
- [x] resource.sources: no change required
- [x] resource.name: rules are relaxed, existing names can remain as is
- [x] resource.path: dot-paths are now forbidden. In the edge case there is such a path provided, we should not convert it, because it is impossible to know what would be the correct path. These types of paths will be flagged when reading a resource.
- [x] resource.encoding: allows more, no action required
For each dialect
Note that upconverting a dialect requires a remote one to be downloaded and verbosely included.
Add dialect.$schema
- [ ]
dialect.caseSensitiveHeaderis present =>https://datapackage.org/profiles/1.0/tabledialect.json - [ ]
dialect.csvddfVersionis present =>https://datapackage.org/profiles/1.0/tabledialect.json - [ ] Otherwise this can safely be set to
https://datapackage.org/profiles/2.0/tabledialect.json
Unsure about this though. For example, if a dialect was absent (very often the case), one will be added with just the $schema property. The alternative is to leave all dialects as v1 (assuming a $schema that defaults to https://datapackage.org/profiles/1.0/tabledialect.json). That would also mean that remote dialects can stay remote.
Other changes
- [ ] dialect.table: new property, no action required
For each schema
Note that upconverting a schema requires a remote one to be downloaded and verbosely included.
Add schema.$schema
- [ ] Set to
https://datapackage.org/profiles/2.0/tableschema.jsonbecause we will update the schema it to that version.
Update schema.primaryKey
- [ ] Convert from string to array.
Update schema.foreignKeys
- [ ] Convert
schema.foreignKeys.fieldsfrom string to array - [ ] Convert
schema.foreignKeys.reference[x].fieldsfrom string to array - [ ] If
schema.foreignKeys.reference[x].resource= resource name => remove property
No action required
- [x] schema.missingValues: old format still valid, no action required
- [x] schema.fieldMatch: this is
exactfor all v1, but that is also the default for this field, so no need to set it - [x] schema.uniqueKeys: new property, no action required
For each field
Other changes
- [x] field.categories: new property, no action => We can't assume that every field with an
enumshould be converted to a field withcategories. - [x] fields.categoriesOrdered: new property, no action required
- [x] fields.missingValues: new property, no action required
- [x] integer field type:
groupCharis a new property, no action required - [x] list field type: new property, no action required
- [x] datetime field type: default format merely extends current one, no action required
- [x] geopoint field type: documentation update, no action required
- [x] any field type: no conversion needed, but frictionless needs to interpret differently when reading #168
- [x] min/max constraints: can now be used for duration, no action needed
- [x] exclusiveMin/Max constraints: new property, no action required
- [x] jsonSchema constraint: new property, no action required