Robert Bastian
Robert Bastian
This should also land behind an experimental feature.
That's not clear to me from the minutes, and I don't remember details of this discussion. That said, "stable" in 1.5 doesn't mean much anyway, as we can make breaking...
This would basically remove these mappings: https://github.com/unicode-org/icu4x/blob/9491b6358f0e59776cc3a1fe2efb2115a5c02506/provider/datagen/src/transform/segmenter/dictionary.rs#L19-L39, https://github.com/unicode-org/icu4x/blob/9491b6358f0e59776cc3a1fe2efb2115a5c02506/provider/datagen/src/transform/segmenter/lstm.rs#L184-L202, and we'd use aux keys instead. We would have these combinations * `DictionaryForWordOnlyAutoV1Marker`, aux key `cjdict` * `DictionaryForWordLineExtendedV1Marker`, aux keys `khmerdict`, `laodict`,...
In that case I think we should stick with the upstream names. Neither the size of the keys nor the time it takes the binary search over them are significant...
Discussion: * Keep model names, as an extra mapping in any location is not worth the trouble * *Potentially* remove the `_heavy` suffix, because all models are heavy * During...
Can we use the AsciiTrie itself to store the canonical identifiers? IIUC it currently stores lowercase identifiers, and we convert requests to lowercase before doing a lookup. Instead, it could...
Discussion: - @sffc I have considered that, but you'd need to backtrack if you take the wrong branch and it gets complicated - @robertbastian I'd just disallow two different casings...
I think we should host the website in https://github.com/unicode-org/icu4x-docs.
Can we use symlinks? I'd rather not copy around, because then we'd have another CI job to enforce consistency, and another make job to "generate" these.