icu icon indicating copy to clipboard operation
icu copied to clipboard

ICU-22119 Treat Korean Hangul as $AL in lw=phrase

Open jungshik opened this issue 3 years ago • 4 comments

  1. Derive {line,line_loose,line_normal}_phrase.txt from {line,line_normal,line_loose}.txt by adding [:Hang:] to $AL to avoid splitting a contiguous run of Hangul across lines.
  2. Do not limit lw=phrase to Japanese because line_*_phrase.txt's are now used in root.
Checklist
  • [V] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22119
  • [V] Required: The PR title must be prefixed with a JIRA Issue number.
  • [ ] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • [V] Required: Each commit message must be prefixed with a JIRA Issue number.
  • [ ] Issue accepted (done by Technical Committee after discussion)
  • [ ] Tests included, if applicable
  • [ ] API docs and/or User Guide docs changed or added, if applicable

jungshik avatar Aug 25 '22 16:08 jungshik

This is the implementation of the first approach for ICU-22119 mentioned in the issue. It's not yet ready for merge but just to seek the opinion as to which of 3 approaches would be best.

jungshik avatar Aug 25 '22 16:08 jungshik

What is the size increase of the DAT file by this PR? @allensu05

FrankYFTang avatar Aug 31 '22 21:08 FrankYFTang

Thanks! In the ICU meeting last week we agreed on your option 2, so you can probably close this PR here.

markusicu avatar Sep 06 '22 22:09 markusicu

Thanks! In the ICU meeting last week we agreed on your option 2, so you can probably close this PR here.

@jungshik ok to close?

markusicu avatar Sep 22 '22 16:09 markusicu