kraken icon indicating copy to clipboard operation
kraken copied to clipboard

mmap() on low-memory systems

Open fungs opened this issue 11 years ago • 1 comments

Hi, first I'd like to say the Kraken is a really well-written program!

I found that kraken (the classification part) does not succeed on systems where the amount of main memory (+ swap) is smaller than the index (database.kdb). However, I believe this should be possible via memory mapping, in particular in this case because the data needs to only be read by the program which allows the OS to do efficient swapping. While it should be technically possible, I cannot make any comment about whether it would be efficient.

Issue: In the file quickfile.cpp, you correctly use the parameters PROT_READ and MAP_SHARED to trigger this kind of reading in the read-only mode. However, its seems the database file is always opened in read-write mode and I don't know why. IMO the correct way would be to use the read-only flags and warn the user if this results in an inefficient memory access behavior or to require a parameter like '--force-memory-overcommit' in the classification program.

Cheers, Johannes

fungs avatar Oct 31 '14 13:10 fungs

Just to prove that this is technically possible: I ran Kraken with the full 100 GiB index from a notebook computer, serving the index files via SSHFS by setting memory over-commit feature in Linux via (echo 1 > /proc/sys/vm/overcommit_memory).

Processed 267079 sequences (156682865 bp) ... 267178 sequences (159.10 Mbp) processed in 263697.346s (0.1 Kseq/m, 0.04 Mbp/m). 266913 sequences classified (99.90%) 265 sequences unclassified (0.10%)

real 4394m58.581s user 5m14.828s sys 32m11.800s

It was certainly slow but mostly due to the network access. This system had 3 GiB of memory.

fungs avatar Nov 03 '14 14:11 fungs