tongrams
tongrams copied to clipboard
A C++ library providing fast language model queries in compressed space.
Hi. I have written an abstraction layer around multiple libraries doing word splitting (`londonisacapitalofgreatbritain` must become `london is a capital of great britain`). All the libs rely on preprocessed ngrams...
Currently, `boost` is used: - for the preprocessor's `for_each`; - for memory mapped files; - for iterating through gzipped files.
Create one master tool `tongrams` with sub-tools, like `tongrams build` and `tongrams query` instead of many separate executables.