krakenuniq
krakenuniq copied to clipboard
usage of work-on-disk
HI,
I see there is an option work-on-disk to use less RAM. But when I tried to use this option, the software tell me this message:
srun --mem=300G --cpus-per-task=10 krakenuniq-build --db DBDIR --work-on-disk --verbose --threads 10
Found jellyfish v1.1.12
Kraken build set to minimize RAM usage.
Found 1 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using jellyfish
Hash size not specified, using '2574908690'
K-mer set created. [5m10.872s]
Skipping step 2, no database reduction requested.
Sorting k-mer set (step 3 of 6)...
db_sort: Getting database into memory ...Loaded database with 2505421358 keys with k of 31 [val_len 4, key_len 8].
Loaded database with 2505421358 keys with k of 31 [val_len 4, key_len 8].
db_sort: Sorting ...db_sort: Sorting complete - writing database to disk ...
K-mer set sorted. [36m23.975s]
Creating seqID to taxID map (step 4 of 6)..
1278 sequences mapped to taxa. [0.013s]
Creating taxDB (step 5 of 6)...
Building taxonomy index from taxonomy//nodes.dmp and taxonomy//names.dmp. Done, got 2361119 taxa
taxDB construction finished. [2m59.077s]
Building KrakenUniq LCA database (step 6 of 6)...
Reading taxonomy index from taxDB. Done.
You need to operate in RAM (flag -M) to use output to a different file (flag -o)
xargs: cat: terminated by signal 13
As explained in the log, it says to not use the flag -o, but I am not using it. Is it normal?
The version of krakenuniq is 0.5.8 I am using slurm as a scheduler and the operating system is Centos 8
Thanks for your attention,
Brice
@braffes did you ever find a fix to this issue?
I removed the option --work-on-disk. I don't search how to implement a fix for this issue sorry.
Thanks @braffes for the quick reply! It's a bummer that --work-on-disk
currently doesn't work. This limits one's ability to create large krakenuniq databases
Nick, we are in the process of storing several very large (up to 390GB) KrakenUniq databases on Amazon, so you can simply download them rather than having to build them. They're already up there but we need to check them out first, and then we'll put a link on the KrakenUniq github site. We'll have the links here as well: https://benlangmead.github.io/aws-indexes/
Do you have a KrakenUniq reference database for all reference species in GTDB-release207? That is what I'm currently working on
no, not that one - you'll have to create it. (I'm not sure what GTDB is.)
The GTDB is a newer, sane taxonomy for bacteria and archaea, in which the taxonomy is directly defined from the genome phylogeny: https://gtdb.ecogenomic.org/
I'm going to need --work-on-disk
to create the database. I ran out of memory on a node with 1 Tb of memory when running krakenuniq-build
@salzberg any progress on fixing the work-on-disk
issue? I'd be happy to help, if possible