krakenuniq icon indicating copy to clipboard operation
krakenuniq copied to clipboard

usage of work-on-disk

Open braffes opened this issue 3 years ago • 8 comments

HI,

I see there is an option work-on-disk to use less RAM. But when I tried to use this option, the software tell me this message:

srun --mem=300G --cpus-per-task=10  krakenuniq-build --db DBDIR  --work-on-disk  --verbose --threads 10                                                                                                                                                              
Found jellyfish v1.1.12
Kraken build set to minimize RAM usage.
Found 1 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using jellyfish
Hash size not specified, using '2574908690'
K-mer set created. [5m10.872s]
Skipping step 2, no database reduction requested.
Sorting k-mer set (step 3 of 6)...
db_sort: Getting database into memory ...Loaded database with 2505421358 keys with k of 31 [val_len 4, key_len 8].
Loaded database with 2505421358 keys with k of 31 [val_len 4, key_len 8].
db_sort: Sorting ...db_sort: Sorting complete - writing database to disk ...
K-mer set sorted. [36m23.975s]
Creating seqID to taxID map (step 4 of 6)..
1278 sequences mapped to taxa. [0.013s]
Creating taxDB (step 5 of 6)...
Building taxonomy index from taxonomy//nodes.dmp and taxonomy//names.dmp. Done, got 2361119 taxa
taxDB construction finished. [2m59.077s]
Building  KrakenUniq LCA database (step 6 of 6)...
Reading taxonomy index from taxDB. Done.
You need to operate in RAM (flag -M) to use output to a different file (flag -o)
xargs: cat: terminated by signal 13

As explained in the log, it says to not use the flag -o, but I am not using it. Is it normal?

The version of krakenuniq is 0.5.8 I am using slurm as a scheduler and the operating system is Centos 8

Thanks for your attention,

Brice

braffes avatar Sep 14 '21 15:09 braffes

@braffes did you ever find a fix to this issue?

nick-youngblut avatar Jun 08 '22 11:06 nick-youngblut

I removed the option --work-on-disk. I don't search how to implement a fix for this issue sorry.

braffes avatar Jun 08 '22 19:06 braffes

Thanks @braffes for the quick reply! It's a bummer that --work-on-disk currently doesn't work. This limits one's ability to create large krakenuniq databases

nick-youngblut avatar Jun 09 '22 06:06 nick-youngblut

Nick, we are in the process of storing several very large (up to 390GB) KrakenUniq databases on Amazon, so you can simply download them rather than having to build them. They're already up there but we need to check them out first, and then we'll put a link on the KrakenUniq github site. We'll have the links here as well: https://benlangmead.github.io/aws-indexes/

salzberg avatar Jun 09 '22 12:06 salzberg

Do you have a KrakenUniq reference database for all reference species in GTDB-release207? That is what I'm currently working on

nick-youngblut avatar Jun 09 '22 12:06 nick-youngblut

no, not that one - you'll have to create it. (I'm not sure what GTDB is.)

salzberg avatar Jun 09 '22 13:06 salzberg

The GTDB is a newer, sane taxonomy for bacteria and archaea, in which the taxonomy is directly defined from the genome phylogeny: https://gtdb.ecogenomic.org/

I'm going to need --work-on-disk to create the database. I ran out of memory on a node with 1 Tb of memory when running krakenuniq-build

nick-youngblut avatar Jun 09 '22 13:06 nick-youngblut

@salzberg any progress on fixing the work-on-disk issue? I'd be happy to help, if possible

nick-youngblut avatar Jun 21 '22 07:06 nick-youngblut