unicodetools icon indicating copy to clipboard operation
unicodetools copied to clipboard

faster/simpler Age / DerivedAge.txt generation

Open markusicu opened this issue 7 months ago • 0 comments

We currently compute a character's Age by iterating over all Unicode versions from 1.1 to latest=dev and returning the first version where its code point is no longer unassigned.

This means that we have to load/parse data files for each Unicode version.

This should be unnecessary, because we already have a DerivedAge.txt file from at least the last version. We should be able to:

  • parse the dev=latest file
  • recent additions: take the set of [[:age=NA:]&[:^gc=Cn:]] and map them to the latest version
  • recent removals: take the set of [[:^age=NA:]&[:gc=Cn:]] and map them to "NA"

We should move the old logic into a test, to verify that the simple logic yields the same Age property map as the brute force code.

Discussion: See this comment and replies to it: https://github.com/unicode-org/unicodetools/pull/1116#issuecomment-2848036146

markusicu avatar May 02 '25 21:05 markusicu