icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Filter low-frequency suffixes from Dense ZeroTrie

Open sffc opened this issue 1 month ago • 1 comments

Suffixes that occur in a low percentage of rows should not be added to the dense matrix.

Docs for background: https://unicode-org.github.io/icu4x/rustdoc/zerotrie/dense/struct.ZeroAsciiDenseSparse2dTrieOwned.html

In theory we could calculate how much each suffix adds, but doing so would require recomputing the whole structure, which might be costly. A good first step would be to add a test case that we want to optimize and pick a heuristic that optimizes the size of that test case.

sffc avatar Dec 11 '25 01:12 sffc

assign this to me

s3arthak avatar Dec 11 '25 09:12 s3arthak

If you want to work on this, please open a PR and link it to this issue. We don't use GitHub assignees for non-milestone issues.

sffc avatar Dec 11 '25 18:12 sffc