CLDR-15725 compound unit transforms
CLDR-15725
- [ ] This PR completes the ticket.
Put in a proof of concept, with rules using regex. Added spreadsheet with current results, just a few rule sets. https://docs.google.com/spreadsheets/d/1L58nWgxn8sWiOmfTf1VgdejRp12fJqZ91fG5ru4v-VQ/edit#gid=0
Need to analyze more cases, seeing what rules would result.
This looks very promising! {presumably we want the code in ICU eventually...)
Hooray! The files in the branch are the same across the force-push. 😃
~ Your Friendly Jira-GitHub PR Checker Bot
Fleshed out a bit more, added first cut at the list rules for ICU. Also tested inserting spaces between chinese and non-chinese letters, and changing 'a' to 'an' before vowels in English.
Looks good but I had a concern about spacing getting added between Han characters and CJK punctuation like brackets.
Good question. That is only adding spaces between letters (between a Han and a non-Han). I suspect it will need some refinement....
On Mon, Jul 18, 2022 at 5:04 PM Peter Edberg @.***> wrote:
Looks good but I had a concern about spacing getting added between Han characters and CJK punctuation like brackets.
— Reply to this email directly, view it on GitHub https://github.com/unicode-org/cldr/pull/2156#issuecomment-1188453161, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMBAVPA2AQDLKGSSXKLVUXWHLANCNFSM52RZWS4A . You are receiving this because you authored the thread.Message ID: @.***>
If the current code does not add code between Han chars and punctuation, that is good, I was afraid that it did. I think we may need to add spaces between Han chars and Latin digits, will check on that. But it seems like the current code makes some of the improvements we want without doing any harm. so that is good.