icu
icu copied to clipboard
ICU-22789 Add Segmenter API to conveniently wrap BreakIterator
In order to "modernize" the BreakIterator
API, this PR introduces a new wrapper using a more convenient, modern API design around a Segmenter
interface.
A few of the goals that motivate the new Segmenter
API:
- Use newer Java features from Java 8 that support the
Stream
API which underlies a functional programming style - Create instances that are immutable (reduces complexity borne of statefulness; allows user code to be more referentially transparent)
- Create a wrapper class around the iteration. This allows the decoupling of the iteration of a source string from the construction of the BreakIterator such that we can perform iteration over one string in isolation from other strings
- Use interfaces to properly decouple and abstract. APIs built on top of interfaces can allow user-created implementations to participate in such higher level APIs.
More details in the design doc.
Checklist
- [X] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22789
- For minor changes you can use one of the following per-release tickets:
- ICU 77 code warnings/version updates: ICU-22920 — Fix compiler warnings. Update versions of code-related dependencies (e.g., dependabot).
- ICU 77 docs minor fixes: ICU-22921 — User Guide & API docs typos etc., and version updates (e.g., dependabot for User Guide)
- [X] Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
- [X] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
- [X] Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
- [X] Issue accepted (done by Technical Committee after discussion)
- [X] Tests included, if applicable
- [ ] API docs and/or User Guide docs changed or added, if applicable