krakenuniq
krakenuniq copied to clipboard
Timing out with pre-loading of chunked database
Dear KrakenUniq team,
I'm having an interesting issue that's quite obviously related to the temporary files, and that I'm not sure how to circumvent.
Basically, I need to run many (few thousand) samples using KrakenUniq. I have a 480 Gb database and run it on 128 Gb hosts using the following command:
krakenuniq --threads 16 --db $KUDB --preload-size 120G --output kuniq.output.txt --report-file kuniq.report.txt --paired $R1 $R2
I had issues with temporary files being written to Unix's $TMPDIR
, which in our case was set to a Lustre dir, which is known to have issues with many files in same directory (anything over few thousand is a no). Thus, I've re-set the TMPDIR to the current dir for each sample, which seems to have solved that issue.
However, depending on the load on our cluster's I/O, quite often KrakenUniq now just moves on - without producing any error or any output (a header of the report is written, and some log, but nothing else). I assume it's an issue with how long it waits for the response from the temp file. Is there a way to increase it?
if the hosts have 128 Gb then I recommend you use a size much smaller than that, not 120G, when running KrakenUniq. Try it with 32G and see if that fixes the problem. Many servers don't really let you use so much of the RAM (120 out of 128G) so it depends on the configuration.
Thank you for the suggestion! Unfortunately, it didn't fix the issue. Out of about 1500 runs, 800 quit with the same symptoms. I have reduced the number of threads, and the number of errors went down significantly it seems; I have a feeling it's because fewer temp files are written (I see that 1 per thread is written and they are concatenated later). But I'm still at a loss as to how can one debug this.
I see what's happening now. You are running out of temp storage space. In order to run in less RAM, krakenuniq with the preload-size option goes through the database in chunks, writing temporary results to the tmp area for each chunk. So by running all those jobs at once, you're overwhelming your temp area. Just run fewer at a time.