berkeleylm icon indicating copy to clipboard operation
berkeleylm copied to clipboard

no start token with LmReaders.readNgramMapFromBinary?

Open danielnaber opened this issue 9 years ago • 0 comments

When I use LmReaders.readNgramMapFromBinary and use the map to look up ngram that starts with the start symbol, I never get any occurrence counts > 0. Is this expected? Is it an issue of the software or of the data? I'm using the Google Books LM (German) from http://tomato.banatao.berkeley.edu:8080/berkeleylm_binaries/

Example: map.get(Arrays.asList("<S>", "Das")) => null

I tried <S>, <s>, and _START_ as start symbols, but the results are the same.

danielnaber avatar Dec 05 '15 13:12 danielnaber