colibri-core icon indicating copy to clipboard operation
colibri-core copied to clipboard

Load corpora with mmap

Open andreasvc opened this issue 8 years ago • 1 comments

Would it be possible to load copora with mmap? This would make it possible to work with corpora larger than the available RAM, and is much more efficient if only a small part of a file is going to be used anyway.

andreasvc avatar May 05 '16 17:05 andreasvc

When (encoded) corpora are read to build a pattern model, they are already read line by line and not kept in memory.

Mmap is an interesting suggestion though, I'd have to dive into it deeper to see if there are possibilities.

proycon avatar May 25 '16 14:05 proycon