hyperglot
hyperglot copied to clipboard
Compare Hyperglot data to Unicode CLDR, possibly also export to CLDR
The comparison branch now has tools/cldr_comparison.html — save locally and view in browser for a side by side comparison between Hyperglot and CLDR.
In addition to the technical notes at the beginning, a few observations:
- CLDR is missing plenty of languages HG lists (no surprise)
- I think most languages listed as "not in Hyperglot" are mapping issues between the IANA/ISO639-1/2/3 and macrolanguage/deprecated tags
- The list tag follows HG where possible, IANA languages tags otherwise (so a CLDR xxx.xml might not be listed as xxx, but the found language tag, if found)
- CLDR has quite a few autonyms HG is missing
- CLDR does locale ("territory") and script inherinting, so characters in
bo.xmlshould be inherited tobo_Cyrl.xmlif there are none — this isn't implemented for the comparison, so those locale/script alternate versions of the CLDR that use this implicit inheriting will have no characters and thus show all characters as missing (present in HG) by comparison. Also many of the CLDR locale's are not different orthographies per se, but just listed. Also there is no attempt to find any of "alternate" HG orthographies to compare those to, firstly because it is not possible (no locale/what key to map to in HG) and secondly it would create a many-to-many comparison and explode the table
For now I'm closing this as we have done a basic comparison and there is nothing actionable proposed right now. If we think about some automated export / upstream contributions we need to spec that separately.