cldr CLDR-15725 compound unit transforms

CLDR-15725

[ ] This PR completes the ticket.

Jul 04 '22 04:07 macchiati

Put in a proof of concept, with rules using regex. Added spreadsheet with current results, just a few rule sets. https://docs.google.com/spreadsheets/d/1L58nWgxn8sWiOmfTf1VgdejRp12fJqZ91fG5ru4v-VQ/edit#gid=0

Need to analyze more cases, seeing what rules would result.

Jul 05 '22 22:07 macchiati

This looks very promising! {presumably we want the code in ICU eventually...)

Jul 05 '22 22:07 pedberg-icu

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

Jul 17 '22 19:07 jira-pull-request-webhook[bot]

Fleshed out a bit more, added first cut at the list rules for ICU. Also tested inserting spaces between chinese and non-chinese letters, and changing 'a' to 'an' before vowels in English.

Jul 18 '22 17:07 macchiati

Looks good but I had a concern about spacing getting added between Han characters and CJK punctuation like brackets.

Jul 19 '22 00:07 pedberg-icu

Good question. That is only adding spaces between letters (between a Han and a non-Han). I suspect it will need some refinement....

On Mon, Jul 18, 2022 at 5:04 PM Peter Edberg @.***> wrote:

Looks good but I had a concern about spacing getting added between Han characters and CJK punctuation like brackets.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/cldr/pull/2156#issuecomment-1188453161, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMBAVPA2AQDLKGSSXKLVUXWHLANCNFSM52RZWS4A . You are receiving this because you authored the thread.Message ID: @.***>

Jul 19 '22 01:07 macchiati

If the current code does not add code between Han chars and punctuation, that is good, I was afraid that it did. I think we may need to add spaces between Han chars and Latin digits, will check on that. But it seems like the current code makes some of the improvements we want without doing any harm. so that is good.

Jul 19 '22 03:07 pedberg-icu