Lukas Schmelzeisen

Results 33 comments of Lukas Schmelzeisen

I just noticed that sometimes the same formula images are shown multiple times. [This is how it is supposed to look](http://i.imgur.com/vAoDgUA.png).

Just some clarification on the GLM case: Calculating the probability `P(b | a b a)` for example require the lower order probability `^P(b | _ b a)`. For this lower...

Should now be implemented now for all estimators. I'm reopening this issue as it turns out that this fix doesn't apply to "mod-kneser-ney"-esque with three discount values `D_1`, `D_2`, `D_3+`....

in `AbstractEstimator#logTrace` and `Output#logWithoutAnsi`.

I did some testing with the [`en0008t` corpus](https://gist.githubusercontent.com/lukasschmelzeisen/d4193c4bbc8711d30a09/raw/934e4865818fab81f39ebacf4b70db21d1d85405/en0008t) using a [vocab file](https://gist.githubusercontent.com/lukasschmelzeisen/d4193c4bbc8711d30a09/raw/3bd4648897e47f7030c4426a30f247594a6260a2/en0008t.vocab) I filtered the `testing-samples-5.txt` to a [vocab-filtered-testing-file](https://gist.githubusercontent.com/lukasschmelzeisen/d4193c4bbc8711d30a09/raw/93ed188fd25c3878dd21dbd0afb03bc1d4a3e9a9/testing-with-vocab). Using `Cond5` querying it resulted in only `2` zero-probabilities of `48019`...

Yes, I meant at start of training (is fixed above). I don't believe your 80% number (as I only got 2 zero-probabilities in 48019). I ask you do redo your...

Can you explain how `` and `` tags would obsolete our requirement for length distribution?

Commit 9e4c6a7e740eaa55183431a5748fe31e445054b4 scans corpus for reserved symbols and refuses execution if it contains any. However I'n the long run I would like to have some form of escaping the input...

Do we want something like `MarkovCond`?