icu icon indicating copy to clipboard operation
icu copied to clipboard

ICU-22789 Add Segmenter API to conveniently wrap BreakIterator

Open echeran opened this issue 4 months ago • 0 comments

In order to "modernize" the BreakIterator API, this PR introduces a new wrapper using a more convenient, modern API design around a Segmenter interface.

A few of the goals that motivate the new Segmenter API:

  • Use newer Java features from Java 8 that support the Stream API which underlies a functional programming style
  • Create instances that are immutable (reduces complexity borne of statefulness; allows user code to be more referentially transparent)
  • Create a wrapper class around the iteration. This allows the decoupling of the iteration of a source string from the construction of the BreakIterator such that we can perform iteration over one string in isolation from other strings
  • Use interfaces to properly decouple and abstract. APIs built on top of interfaces can allow user-created implementations to participate in such higher level APIs.

More details in the design doc.

Checklist

  • [X] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22789
    • For minor changes you can use one of the following per-release tickets:
    • ICU 77 code warnings/version updates: ICU-22920 — Fix compiler warnings. Update versions of code-related dependencies (e.g., dependabot).
    • ICU 77 docs minor fixes: ICU-22921 — User Guide & API docs typos etc., and version updates (e.g., dependabot for User Guide)
  • [X] Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • [X] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • [X] Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • [X] Issue accepted (done by Technical Committee after discussion)
  • [X] Tests included, if applicable
  • [ ] API docs and/or User Guide docs changed or added, if applicable

echeran avatar Oct 08 '24 23:10 echeran