Francis Tyers
Francis Tyers
@anjalibhavan well, the first version would be just the algorithm as described in the paper. Later the tool would support weighting lttoolbox transducers according to the vocabulary of the tool.
The code is implemented in Python by https://github.com/rsennrich/subword-nmt
The way to do this right now is basically to use vocabulary coverage over a corpus. This is the best indicator of the quality of a pair. This is something...
@Ryu945 I can't do it, but you could! :) I don't expect that such a script should take longer than an hour or two to write.
``` fran@matxine:~/source/apertium/staging/apertium-mlt-heb$ gdb lt-print GNU gdb (Debian 8.1-4+b1) 8.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are...
Ok, this seems like a classic error and might be a duplicate. Here is the fix: ``` diff --git a/apertium-mlt-heb.mlt-heb.dix b/apertium-mlt-heb.mlt-heb.dix index 8c75d3d..d9dd506 100644 --- a/apertium-mlt-heb.mlt-heb.dix +++ b/apertium-mlt-heb.mlt-heb.dix @@ -88,6...
Here is the relevant code: https://github.com/apertium/lttoolbox/blob/master/lttoolbox/buffer.h#L74
@mr-martian that sounds a bit more complicated. Also, it would be cool to be able to give weights to sections, but I'll open another issue for that.
Then we should do that. Could you do it to `ell-eng` and any other `*-eng` pairs that have received no edits since they were made?
So one way this could work is: * Load original transducer, `A` * Read tagged corpus into a weighted FST, `B` * Intersect `B` and `A`, making `C` * Priority...