berkeleylm no start token with LmReaders.readNgramMapFromBinary?

no start token with LmReaders.readNgramMapFromBinary?

Open danielnaber opened this issue 9 years ago • 0 comments

When I use LmReaders.readNgramMapFromBinary and use the map to look up ngram that starts with the start symbol, I never get any occurrence counts > 0. Is this expected? Is it an issue of the software or of the data? I'm using the Google Books LM (German) from http://tomato.banatao.berkeley.edu:8080/berkeleylm_binaries/

Example: map.get(Arrays.asList("<S>", "Das")) => null

I tried <S>, <s>, and _START_ as start symbols, but the results are the same.

Dec 05 '15 13:12 danielnaber

berkeleylm berkeleylm copied to clipboard

no start token with LmReaders.readNgramMapFromBinary?

berkeleylm
berkeleylm copied to clipboard