Daniel Swanson

Results 66 comments of Daniel Swanson

https://gist.github.com/mr-martian/a8d1562ee95a0f636effd37b35e33171 The above is a python implementation of our 2.5 methods of reading and writing binary data (3 for floats and 2 for everything else) and I could probably turn...

If `modes.xml` is to be trusted, Perceptron is only used by English, and Unigram 2 is used by the following places: ``` ./apertium-nhi-nhn ./apertium-oci ./apertium-tur-tat ./apertium-tur-uzb ./apertium-nci-nhi ./apertium-fao-nor ./apertium-hin-pan ./apertium-kan-mar...

I'm currently working on this in #130 My current design plan is as follows (feedback welcome): The new binary format will start with a header like the transducer one, probably...

New proposal: All of the tagger models are equivalent to single-layer perceptrons with various restrictions on what the features can be. Thus I would like to amend my previous plan...

Yes, old `.prob` files would continue working, but instead of their being an HMM tagger it would be because file reading code interprets HMM files as Perceptrons where the features...

``` Parameters: window width W depth D beam search size B Algorithm: for word in input: for reading in word: features = [extract from an FST] + [last W selected...

> And we keep lemmas dictionary-cased throughout the pipeline? Yes. > I think it should work, if the second module can have some lemma/PoS-specific rules and access to at least...

Actually the reason I started on this now was because I couldn't figure out how to handle case properly within postgen and so was trying to handle it after postgen....

So the proposed pipeline would be | command | use | |------|--------| | `lt-proc -b` | generator | | `cg-proc` | preferences | | `lsx-proc -p` (or something) | postgen...

It occurs to me that the rules in the final step could have roughly the same syntax and semantics as LRX: ```xml ```