Daniel Swanson

Results 66 comments of Daniel Swanson

I believe all that remains now is to implement language variants (which should just be a matter of storing the strings from the command line and then comparing them with...

Two issues: - I'm inclined to not implement ACX support directly in this, since that would involve entirely duplicating the code in `hfst-expand-equivalences` - I need to rename `lt-proc`'s `-v/--variant`...

ok, so if I run it on `apertium-eng.eng.dix` it runs out of memory and crashes, but if I run it on smaller things it does fine. Further investigation required.

Turns out the issue was forgetting to move on to the next character after reading a `|` in a regex.

It looks like the issue is that the parser assumes that if there is a newline in an annotation, then it will be the last character before the quotation, but...

That's `DEL.TextGrid` from the post (added `.txt` so github would allow it).

Is there documentation somewhere of what the different formats are currently?

In answer to my own question, here's the current binary formats: ## Unigram Model 1 Uses `serialiser.h` Data structure: ``` map Model1 analyzed form => number of occurences in training...

As far as I can tell, `serialiser.h` was added in the process of implementing the unigram taggers and is only used for them and the perceptron tagger.

Will it be necessary to provide a way of converting from the old binary format to the new one or can we assume that retraining on the same corpora will...