icu4x
icu4x copied to clipboard
Add a segmenter constructor for pure rule-based line and word segmentation
Spawned out of https://github.com/unicode-org/icu4x/issues/3575
Some users may want to solely run rule based segmentation, perhaps if they are on low-resource systems.
Given the conclusion in https://github.com/unicode-org/icu4x/issues/3575 that the dictionary/lstm keys must always be present (if empty) to construct a segmenter, it is probably worth adding a pure rule-based segmenter option to the segmenter so that (purely) rule based segmenters can be used without having to build your own data with dummy dictionary/lstm keys, and the dictionary/lstm code is not used.
cc @eggrobin