cldr icon indicating copy to clipboard operation
cldr copied to clipboard

CLDR-15725 compound unit transforms

Open macchiati opened this issue 3 years ago • 7 comments

CLDR-15725

  • [ ] This PR completes the ticket.

macchiati avatar Jul 04 '22 04:07 macchiati

Put in a proof of concept, with rules using regex. Added spreadsheet with current results, just a few rule sets. https://docs.google.com/spreadsheets/d/1L58nWgxn8sWiOmfTf1VgdejRp12fJqZ91fG5ru4v-VQ/edit#gid=0

Need to analyze more cases, seeing what rules would result.

macchiati avatar Jul 05 '22 22:07 macchiati

This looks very promising! {presumably we want the code in ICU eventually...)

pedberg-icu avatar Jul 05 '22 22:07 pedberg-icu

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

Fleshed out a bit more, added first cut at the list rules for ICU. Also tested inserting spaces between chinese and non-chinese letters, and changing 'a' to 'an' before vowels in English.

macchiati avatar Jul 18 '22 17:07 macchiati

Looks good but I had a concern about spacing getting added between Han characters and CJK punctuation like brackets.

pedberg-icu avatar Jul 19 '22 00:07 pedberg-icu

Good question. That is only adding spaces between letters (between a Han and a non-Han). I suspect it will need some refinement....

On Mon, Jul 18, 2022 at 5:04 PM Peter Edberg @.***> wrote:

Looks good but I had a concern about spacing getting added between Han characters and CJK punctuation like brackets.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/cldr/pull/2156#issuecomment-1188453161, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMBAVPA2AQDLKGSSXKLVUXWHLANCNFSM52RZWS4A . You are receiving this because you authored the thread.Message ID: @.***>

macchiati avatar Jul 19 '22 01:07 macchiati

If the current code does not add code between Han chars and punctuation, that is good, I was afraid that it did. I think we may need to add spaces between Han chars and Latin digits, will check on that. But it seems like the current code makes some of the improvements we want without doing any harm. so that is good.

pedberg-icu avatar Jul 19 '22 03:07 pedberg-icu