icu4x
icu4x copied to clipboard
char16trie converter
Now segmenter uses char16trie
for dictionary segmenter. East Asian dictionary can remove/move to LSTM, but Chinese and Japanese still use it.
Actually, current data is generated by ICU4C's tools then binary data by that tool converted to TOML file. So I guess that it is better to add generation tools for char16trie
from ICU4C's dictionary text file.
Consider doing like we did for the CodePointTrieBuilder. Rather than writing the code ourselves, we compile the ICU4C builder code into a WASM file and ship that in our repo.
@makotokato Does this block any other issues? Can you set an assignee (or "help wanted") and a milestone (or "backlog")?
@makotokato Does this block any other issues? Can you set an assignee (or "help wanted") and a milestone (or "backlog")?
Not blocker.