unicode-data Split into `unicode-data-core` and `unicode-data`

Following discussion in PR #75:

To avoid making too many packages we can possibly have just two packages, we can call the lightweight package as unicode-data-core and bundle everything including unicode-data-core in the all-inclusive unicode-data package.

Currently there are 4 packages depending on unicode-data.

We should define criteria where to include APIs. As of now, names & scripts are not considered “core”. What about blocks?
This will require a major version bump.
What about the existing package unicode-names? If we do not create unicode-data-blocks-scripts, maybe we can deprecate it in favor of the new batteries included unicode-data.

I would propose the following plan:

Merge #75.
Publish unicode-data-0.3.1 with all changes so far.
Update to Unicode 15.0.
Publish unicode-data-0.4.0 and names.
Rename unicode-data to unicode-data-core.
Re-create unicode-data, that re-export all unicode-data-* packages.
Publish unicode-data-core-1.0, unicode-data-names-1.0, unicode-data-scripts-1.0 , unicode-data-security-0.1 and unicode-data-1.0.

@harendra-kumar @adithyaov @Bodigrim

Sep 14 '22 08:09 wismill

Sounds good to me.

Sep 24 '22 11:09 adithyaov

When we do the split, I propose the following new version scheme: U.B.M, where:

U is the supported Unicode major version; i.e. 15 for Unicode 15.0.0.
B is used to mark breaking changes: minor Unicode update or any change requiring a version bump according to PVP. It starts at 0 with every new major Unicode version.
M is used for non-breaking changes, such as additions to the API (see PVP).
All the PVP rules apply.

PROS:

It is easier to identify what Unicode version is supported.
Unicode version scheme is very stable. Minor updates are uncommon.
Promote the packages by marking them as mature (0. prefix usually means beta).

CONS:

Increasing the major version from 0 usually indicates the software is very stable and production-ready; not all unicode-data-* packages reach this stage yet. Perhaps this should apply only to packages we judge production-ready, thus keeping the usual 0.X.Y scheme for beta packages. I would say at least unicode-data, unicode-data-core and unicode-data-names are candidates for a version 15.0.0.
It makes our version scheme depend on a third party’s one. But it is very stable and we already bump version for Unicode updates.
It does not reflect the exact Unicode version. But the minor updates of Unicode are uncommon.
Too big, looks like a browser version: I do not mind versions greater than 10 and we can expect at most one major version bump a year.
Some packages may see no change with a new Unicode version. This is unlikely as a new Unicode version usually includes new characters, which will modify the bitmaps. It may happen if the characters have default values. This is not the case for Unicode 15.0.0.

Change in the plan:

If accepted: skip version 1.0 and publish version 15.0.0 instead.

Sep 27 '22 05:09 wismill

Your pros and cons seem pretty thorough. The cons do not look significant. We can go with this scheme. I am wondering if there is anything to learn from the ICU versioning scheme here: https://icu.unicode.org/processes .

Oct 06 '22 21:10 harendra-kumar

We should probably send an email to @Bodigrim for his opinion, in case we are missing something.

Oct 06 '22 21:10 harendra-kumar

Sounds good to me.

(Sorry, I have extremely limited bandwidth at the moment and this is unlikely to improve soon, so feel free to act without waiting for me)

Oct 07 '22 17:10 Bodigrim

unicode-data unicode-data copied to clipboard

Split into `unicode-data-core` and `unicode-data`

unicode-data
unicode-data copied to clipboard