Daniel Swanson
Daniel Swanson
I believe all that remains now is to implement language variants (which should just be a matter of storing the strings from the command line and then comparing them with...
Two issues: - I'm inclined to not implement ACX support directly in this, since that would involve entirely duplicating the code in `hfst-expand-equivalences` - I need to rename `lt-proc`'s `-v/--variant`...
ok, so if I run it on `apertium-eng.eng.dix` it runs out of memory and crashes, but if I run it on smaller things it does fine. Further investigation required.
Turns out the issue was forgetting to move on to the next character after reading a `|` in a regex.
It looks like the issue is that the parser assumes that if there is a newline in an annotation, then it will be the last character before the quotation, but...
That's `DEL.TextGrid` from the post (added `.txt` so github would allow it).
Is there documentation somewhere of what the different formats are currently?
In answer to my own question, here's the current binary formats: ## Unigram Model 1 Uses `serialiser.h` Data structure: ``` map Model1 analyzed form => number of occurences in training...
As far as I can tell, `serialiser.h` was added in the process of implementing the unigram taggers and is only used for them and the perceptron tagger.
Will it be necessary to provide a way of converting from the old binary format to the new one or can we assume that retraining on the same corpora will...