Clean up CheckXmlProperties
See https://unicode-org.github.io/unicodetools/newunicodeproperties.html#checking-xml-properties
- Make into JUnit test
- Fix the failures and warnings due to differences in handling (default values) and problems with the test (interpreting #, and \u00xxx.)
- Also:
- TheMeroitic fraction series, expressed as twelfths. They won't match exactly if the stored fractional values are all reduced. This should be marked as expected, if this is the way the test works.
- Named_Sequences_Prov is always provisional and never approved. It shouldn't be in the XML (and isn't) and shouldn't be tested for (other than its non-appearance, I suppose).
- kRSJapanese, kRSKanWa, and KRSKorean were provisional fields, never given informational status, and have been yanked altogether from Unihan. They also should not appear and not be tested for.
See also issue #125 “Remove false positives from the ucdxml consistency tests”
Mark, in email you wrote about the Meroitic fractions: “the UcdProperties have the reduced values” Is that necessary? As long as we still store the fraction strings, it would be more illustrative to keep the fractions as-is, rather than simplifying them.
The software right now treats them as numerical values — which is the way most software will. So I think it is a mistake to treat them just as the surface form.
I can just smarten the test to compare the numeric value of both the xml format and the UcdProperty format.
On Wed, Aug 31, 2022 at 4:03 PM Markus Scherer @.***> wrote:
Mark, in email you wrote about the Meroitic fractions: “the UcdProperties have the reduced values” Is that necessary? As long as we still store the fraction strings, it would be more illustrative to keep the fractions as-is, rather than simplifying them.
— Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/325#issuecomment-1233529433, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMEGAG7RP4T264SWVELV37QFTANCNFSM6AAAAAAQASJNKY . You are receiving this because you authored the thread.Message ID: @.***>
The software right now treats them as numerical values — which is the way most software will. So I think it is a mistake to treat them just as the surface form.
Depends on what the goal is. I doubt that most software does anything with numeric values of characters other than gc=Nd. For a precise, readable, understandable presentation of the value, these fractions are best written as in UnicodeData.txt.
I can just smarten the test to compare the numeric value of both the xml format and the UcdProperty format.
ok -- presumably with an epsilon for dealing with rounding issues
Can do exact match: we have a Rational number class in CLDR.
On Wed, Aug 31, 2022 at 4:22 PM Markus Scherer @.***> wrote:
The software right now treats them as numerical values — which is the way most software will. So I think it is a mistake to treat them just as the surface form.
Depends on what the goal is. I doubt that most software does anything with numeric values of characters other than gc=Nd. For a precise, readable, understandable presentation of the value, these fractions are best written as in UnicodeData.txt.
I can just smarten the test to compare the numeric value of both the xml format and the UcdProperty format.
ok -- presumably with an epsilon for dealing with rounding issues
— Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/325#issuecomment-1233545353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMAF4GW2PDCQSZVYQ4TV37SMDANCNFSM6AAAAAAQASJNKY . You are receiving this because you authored the thread.Message ID: @.***>