colibri-core
colibri-core copied to clipboard
Load corpora with mmap
Would it be possible to load copora with mmap? This would make it possible to work with corpora larger than the available RAM, and is much more efficient if only a small part of a file is going to be used anyway.
When (encoded) corpora are read to build a pattern model, they are already read line by line and not kept in memory.
Mmap is an interesting suggestion though, I'd have to dive into it deeper to see if there are possibilities.