berkeleylm
berkeleylm copied to clipboard
no start token with LmReaders.readNgramMapFromBinary?
When I use LmReaders.readNgramMapFromBinary
and use the map to look up ngram that starts with the start symbol, I never get any occurrence counts > 0. Is this expected? Is it an issue of the software or of the data? I'm using the Google Books LM (German) from http://tomato.banatao.berkeley.edu:8080/berkeleylm_binaries/
Example: map.get(Arrays.asList("<S>", "Das"))
=> null
I tried <S>
, <s>
, and _START_
as start symbols, but the results are the same.