hfst icon indicating copy to clipboard operation
hfst copied to clipboard

Helsinki Finite-State Technology (library and application suite)

Results 114 hfst issues
Sort by recently updated
recently updated
newest added

a typical `hfst-compose-intersect -v` looks like this: ``` Reading and minimizing rule ��_... Reading and minimizing rule �_... Reading and minimizing rule ��_... Reading and minimizing rule �_... Reading and...

As can be seen in https://github.com/hfst/hfst/commit/94b48cb563bbde99691679a755f11030d6a52b4c, the simple task of bumping the version requires touching far too many files. This should be condensed to 1 place and 4-5 variables (API:...

enhancement
help wanted

``` daniel@computer-name-here:~$ cat caps.lexd PATTERNS [x꞉tèeʼ:xte] [nah:na] daniel@computer-name-here:~$ lexd caps.lexd | hfst-txt2fst | hfst-fst2fst -O -o caps.hfstol daniel@computer-name-here:~$ echo "X꞉tèeʼ Nah" | hfst-proc caps.hfstol ^X꞉tèeʼ/xte$ ^Nah/Na$ daniel@computer-name-here:~$ lexd caps.lexd |...

In mns (Mansi) echo 'са̄в' | hfst-tokenise --giella-cg -W $GTHOME/langs/mns/tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst | vislcg3 -g $GTHOME/langs/mns/src/syntax/disambiguator.cg3 | less "" "са̄в" ? :\n BUT echo 'са̄в' |hfst-lookup src/analyser-gt-norm.hfstol|less са̄в са̄в+A 0.000000 са̄в са̄в+N+Sg+Nom...

multichar symbols
hfst-tokenize

When we specify a composed character like `ǩ` as a single arc in an `@bin "foo.hfst"`, hfst-tokenise doesn't analyse it. When it's specified as two arcs `k` and then `...

bug
multichar symbols
hfst-tokenize

I assume that the output of `hfst-tokenize --xerox` should be exactly the same as the output of `hfst-lookup` (minus weights?), but it is not: ```bash $ echo "Саша бларгвит." |...

Bug is from here http://wiki.apertium.org/wiki/User:Firespeaker/HFST_bug. You can use attached patch as a test case. Shortly put: assuming dictioary of three strings {word, formation, word form}, the input string {word formation}...

future
proc

`hfst-proc -q` makes it print weights: ```bash fran@ipek:/tmp/apertium-ewo$ echo "bikie" | hfst-proc -w ewo.automorf.hfst !! Warning: Transducer contains one or more multi-character symbols made up of ASCII characters which are...

[tokeniser.pmscript.zip](https://github.com/hfst/hfst/files/1928049/tokeniser.pmscript.zip) contains: ``` set need-separators off Define morphology @bin"analyser_relabelled-disamb-gt-desc.hfst" ; Define incondform Punct ; Define blank Whitespace | incondform ; Define incondword morphology & [ incondform 0:?* ] ; Define...

hfst-tokenize

Currently there are no wheels for Python 3.8 on PyPI