Kevin Brubeck Unhammer

Results 302 comments of Kevin Brubeck Unhammer

I wouldn't say it's a bug with hyperminimization – this is just how hypermin works, it inserts flags all around. We should be able to skip/ignore those flags when intersecting...

The workaround for now is to ensure we have a single Code Point on each arc, so in the `spellrelax.regex` script that creates combining-character versions of each corresponding non-combined character,...

Haha, how did we miss that … that does seem to solve it :) ``` $ hfst-tokenise -nxw -m k.pmhfst < k.input ǩ ǩ 0 ǩ ǩ 0 ``` @Traubert...

@snomos after c6331179cfbc24b4ddee73e97b518865c3d6b326, hfst-tokenise's --giella-cg mode does --tokenize-multichar by default – I guess this can be closed? (I see now from my notes that I originally dismissed `tokenize_multichar` as only...

Argh, that won't do :( I guess I should revert the default until we figure out why that happens

``` $ echo ' N' |hfst-tokenize -g -m /usr/share/giella/sme/tokeniser-disamb-gt-desc.pmhfst : N\n $ echo ' N' |hfst-tokenize -g /usr/share/giella/sme/tokeniser-disamb-gt-desc.pmhfst : "" "n" N Sem/Sign ABBR Gram/TAbbr Attr "n" N Sem/Sign ABBR...

even better if we could get https://www.aclweb.org/anthology/W13-5641.pdf 's fancy compression thing (but where is the code for that?)

Is this still an issue with the newest hfst? I can't reproduce it: ``` echo "они стартуют" | ~/src/hfst/tools/src/hfst-tokenize --giella-cg tokeniser-disamb-gt-desc.pmhfst "" "они" Pron Pers Pl3 Nom : "" "стартовать"...