Daniel Swanson comments

Results 98 comments of


                                            Daniel Swanson

Add Lttoolbox monodix compiler

I believe all that remains now is to implement language variants (which should just be a matter of storing the strings from the command line and then comparing them with...

Add Lttoolbox monodix compiler

Two issues: - I'm inclined to not implement ACX support directly in this, since that would involve entirely duplicating the code in `hfst-expand-equivalences` - I need to rename `lt-proc`'s `-v/--variant`...

Add Lttoolbox monodix compiler

ok, so if I run it on `apertium-eng.eng.dix` it runs out of memory and crashes, but if I run it on smaller things it does fine. Further investigation required.

Add Lttoolbox monodix compiler

Turns out the issue was forgetting to move on to the next character after reading a `|` in a regex.

Loading fails if `DEL` present in multi-line interval annotation

It looks like the issue is that the parser assumes that if there is a newline in an annotation, then it will be the last character before the quotation, but...

Loading fails if `DEL` present in multi-line interval annotation

That's `DEL.TextGrid` from the post (added `.txt` so github would allow it).

Merge tagger binary formats into a single format

Is there documentation somewhere of what the different formats are currently?

Merge tagger binary formats into a single format

In answer to my own question, here's the current binary formats: ## Unigram Model 1 Uses `serialiser.h` Data structure: ``` map Model1 analyzed form => number of occurences in training...

Merge tagger binary formats into a single format

As far as I can tell, `serialiser.h` was added in the process of implementing the unigram taggers and is only used for them and the perceptron tagger.

Merge tagger binary formats into a single format

Will it be necessary to provide a way of converting from the old binary format to the new one or can we assume that retraining on the same corpora will...