lttoolbox
lttoolbox copied to clipboard
Finite state compiler, processor and helper tools used by apertium
For regular bidix lt-proc -b, we want to just copy over unconsumed tags and that is fine: ``` $ echo '^kake$' |lt-proc -b nob-nno.autobil.bin ^kake/kake$ ``` When using regular generation...
In which I am mildly opinionated about C++ code formatting. Not sure if this should go here and get copied to all the other repos or if there's somewhere else...
This PR adds a new binary format for transducers which is compatible with memory mapping and adds to `lt-proc` the ability to load it via mmap. It also makes the...
**GSoC Task 4** Write the utility to process tagged corpus and the binary lttoolbox file and return weighted analyses. Status: _Under progress_
Given the following paradigms and entries: ``` e e e e ed e ing e ing e ing e e e es e ed e 's 's s s' 's...
`apertium-pretransfer` has option `-e treat ~ as compound separator` – I don't know if any other tools have this, but it would be nice if we could implement support for...
If the dictionary has ```xml kakenekake pc-anepc PC-anePC ``` then we get ```sh $ echo '^kake$ ^KAKE$ ^kake$'|lt-proc -C nob.autogen.bin kakene kakene ``` I would like it to just fall...
b.dix: ```xml [a-zA-Z]+ ``` ```sh $ lt-comp lr b.dix b.bin regex@standard 3 105 $ echo '^HYPERSENSITIVITET$' | \time lt-proc -b b.bin ^HYPERSENSITIVITET/HYPERSENSITIVITET$ 0.18user 0.04system 0:00.22elapsed 100%CPU (0avgtext+0avgdata 141920maxresident)k 0inputs+0outputs (0major+36137minor)pagefaults...
At the moment we add regexes in sections. Minimising regexes takes a long time. So perhaps we could have a special `type="regex"` section that does not minimise, it would speed...