icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Add a segmenter constructor for pure rule-based line and word segmentation

Open Manishearth opened this issue 1 year ago • 0 comments

Spawned out of https://github.com/unicode-org/icu4x/issues/3575

Some users may want to solely run rule based segmentation, perhaps if they are on low-resource systems.

Given the conclusion in https://github.com/unicode-org/icu4x/issues/3575 that the dictionary/lstm keys must always be present (if empty) to construct a segmenter, it is probably worth adding a pure rule-based segmenter option to the segmenter so that (purely) rule based segmenters can be used without having to build your own data with dummy dictionary/lstm keys, and the dictionary/lstm code is not used.

cc @eggrobin

Manishearth avatar Jul 03 '23 09:07 Manishearth