icu4x
icu4x copied to clipboard
Filter low-frequency suffixes from Dense ZeroTrie
Suffixes that occur in a low percentage of rows should not be added to the dense matrix.
Docs for background: https://unicode-org.github.io/icu4x/rustdoc/zerotrie/dense/struct.ZeroAsciiDenseSparse2dTrieOwned.html
In theory we could calculate how much each suffix adds, but doing so would require recomputing the whole structure, which might be costly. A good first step would be to add a test case that we want to optimize and pick a heuristic that optimizes the size of that test case.
assign this to me
If you want to work on this, please open a PR and link it to this issue. We don't use GitHub assignees for non-milestone issues.