berkeleylm issues

Documentation, usage, etc.

Is there some info about how to use the package, how to read ngrams from the data structure, etc?

Convert to Maven for easier builds, and make ARPA reader robust

http://www1.icsi.berkeley.edu/Speech/docs/HTKBook3.2/node213_mn.html specifies the ARPA LM format. In particular, there doesn't seem to be a requirement for tabs, and in fact the CMU Sphinx LM files at http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/ don't use tabs...

witbrock

no license information

The old homepage at https://code.google.com/p/berkeleylm/ declares this software is published under Apache License 2.0, I suggest this should be added to the README here, too.

danielnaber

no start token with LmReaders.readNgramMapFromBinary?

When I use `LmReaders.readNgramMapFromBinary` and use the map to look up ngram that starts with the start symbol, I never get any occurrence counts > 0. Is this expected? Is...

danielnaber

Old mvn file

Can you push the latest version on maven? The current version there is 1.1.2 http://mvnrepository.com/artifact/edu.berkeley.nlp/berkeleylm/ Thanks!

sonalgupta

Unknown Values

Hi, I trained a very simple bigram model, using MakeKneserNeyArpaFromText class. The model included two strings - "hello world" and "hello bye". The following scores were retrieved: "x hello" -101.22185...

ekravi

Calculating log probability over larger document

``` Hello, I would like to use this LM for classification and therefore I need to calculate the log probability of an entire document. One of the getLogProb() methods state:...

GoogleCodeExporter

Priority-Medium

Type-Defect

auto-migrated

Frequency Map

1

``` Good Afternoon, How to generate a map of frequency of n-grams? Thank you. ``` Original issue reported on code.google.com by `[email protected]` on 8 Dec 2014 at 4:48

GoogleCodeExporter

Priority-Medium

Type-Defect

auto-migrated

Getting NAN on last trigram when using google binary

1

``` Hi Adding to my previous posts in issues 19, I am trying to use google binary (from google books) and get log probabilities of trigrams from some text. I...

GoogleCodeExporter

Priority-Medium

Type-Defect

auto-migrated

Unrealistic perplexity

3

``` I'm trying to evaluate 5-gram model on a Vietnamese corpus but the perplexity doesn't seem to be right... What steps will reproduce the problem? 1. Download and extract problem.zip...

GoogleCodeExporter

Priority-Medium

Type-Defect

auto-migrated

berkeleylm
berkeleylm copied to clipboard

Metadata

Documentation, usage, etc.

Convert to Maven for easier builds, and make ARPA reader robust

no license information

no start token with LmReaders.readNgramMapFromBinary?

Old mvn file

Unknown Values

Calculating log probability over larger document

Frequency Map

Getting NAN on last trigram when using google binary

Unrealistic perplexity

← Metadata

Owner

Metadata

berkeleylm berkeleylm copied to clipboard

Metadata

← Metadata

Owner

Metadata

berkeleylm
berkeleylm copied to clipboard