icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Add non-complex segmenter constructors

Open robertbastian opened this issue 1 month ago • 2 comments

Users already do this with custom data (i.e. https://github.com/linebender/parley/pull/436), we should provide an easier way.

robertbastian avatar Dec 03 '25 22:12 robertbastian

Also, reviewing the Parley thread, I don't think they should be using these new constructors? One of the reasons they want to use icu_segmenter is because it supports non-rule-based segmentation. Disabling it seems counterproductive. If they are observing a multi-MB size increase, it means they might be linking the dictionary data or something. The LSTM is much less than that.

sffc avatar Dec 04 '25 00:12 sffc

I don't like the name "empty"

The "empty" constructor is on the internal ComplexPayloads type, it's not public API.

One of the reasons they want to use icu_segmenter is because it supports non-rule-based segmentation. Disabling it seems counterproductive.

They do want to put complex segmentation behind a feature flag or BYOD though, which will be much easier if they don't have to do this through a single API by conditionally changing data provider behaviour.

robertbastian avatar Dec 04 '25 10:12 robertbastian

Would you mind listing the new APIs added by this PR?

I think it is the new_for_non_complex_scripts constructors and some default fns on options structs that already implemented the Default trait. Right?

sffc avatar Dec 18 '25 21:12 sffc