unicodetools icon indicating copy to clipboard operation
unicodetools copied to clipboard

Clean up CheckXmlProperties

Open macchiati opened this issue 3 years ago • 4 comments

See https://unicode-org.github.io/unicodetools/newunicodeproperties.html#checking-xml-properties

  1. Make into JUnit test
  2. Fix the failures and warnings due to differences in handling (default values) and problems with the test (interpreting #, and \u00xxx.)
  3. Also:
  • TheMeroitic fraction series, expressed as twelfths. They won't match exactly if the stored fractional values are all reduced. This should be marked as expected, if this is the way the test works.
  • Named_Sequences_Prov is always provisional and never approved. It shouldn't be in the XML (and isn't) and shouldn't be tested for (other than its non-appearance, I suppose).
  • kRSJapanese, kRSKanWa, and KRSKorean were provisional fields, never given informational status, and have been yanked altogether from Unihan. They also should not appear and not be tested for.

See also issue #125 “Remove false positives from the ucdxml consistency tests”

macchiati avatar Aug 30 '22 16:08 macchiati

Mark, in email you wrote about the Meroitic fractions: “the UcdProperties have the reduced values” Is that necessary? As long as we still store the fraction strings, it would be more illustrative to keep the fractions as-is, rather than simplifying them.

markusicu avatar Aug 31 '22 23:08 markusicu

The software right now treats them as numerical values — which is the way most software will. So I think it is a mistake to treat them just as the surface form.

I can just smarten the test to compare the numeric value of both the xml format and the UcdProperty format.

On Wed, Aug 31, 2022 at 4:03 PM Markus Scherer @.***> wrote:

Mark, in email you wrote about the Meroitic fractions: “the UcdProperties have the reduced values” Is that necessary? As long as we still store the fraction strings, it would be more illustrative to keep the fractions as-is, rather than simplifying them.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/325#issuecomment-1233529433, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMEGAG7RP4T264SWVELV37QFTANCNFSM6AAAAAAQASJNKY . You are receiving this because you authored the thread.Message ID: @.***>

macchiati avatar Aug 31 '22 23:08 macchiati

The software right now treats them as numerical values — which is the way most software will. So I think it is a mistake to treat them just as the surface form.

Depends on what the goal is. I doubt that most software does anything with numeric values of characters other than gc=Nd. For a precise, readable, understandable presentation of the value, these fractions are best written as in UnicodeData.txt.

I can just smarten the test to compare the numeric value of both the xml format and the UcdProperty format.

ok -- presumably with an epsilon for dealing with rounding issues

markusicu avatar Aug 31 '22 23:08 markusicu

Can do exact match: we have a Rational number class in CLDR.

On Wed, Aug 31, 2022 at 4:22 PM Markus Scherer @.***> wrote:

The software right now treats them as numerical values — which is the way most software will. So I think it is a mistake to treat them just as the surface form.

Depends on what the goal is. I doubt that most software does anything with numeric values of characters other than gc=Nd. For a precise, readable, understandable presentation of the value, these fractions are best written as in UnicodeData.txt.

I can just smarten the test to compare the numeric value of both the xml format and the UcdProperty format.

ok -- presumably with an epsilon for dealing with rounding issues

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/325#issuecomment-1233545353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMAF4GW2PDCQSZVYQ4TV37SMDANCNFSM6AAAAAAQASJNKY . You are receiving this because you authored the thread.Message ID: @.***>

macchiati avatar Sep 01 '22 00:09 macchiati