krakenuniq icon indicating copy to clipboard operation
krakenuniq copied to clipboard

Error in final step building database

Open CuypersBart opened this issue 2 years ago • 3 comments

Hi,

I got an error in the final step of building the database. Is there a possibility it actually did complete and is usable? Or if not, do you know what might have gone wrong? Any help would be very much appreciated. Would really like to use Krakenuniq!

/x/krakenuniq-0.7.3/krakenuniq-build --db ./DB --kmer-len 31 --threads 36 --taxids-for-genomes --taxids-for-sequences

st.out

Kraken build set to minimize disk writes. Found 39799 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory. Creating k-mer set (step 1 of 6)... Using /mydir/krakenuniq-0.7.3/krakenuniq/jellyfish-install/bin/jellyfish Hash size not specified, using '36208587840' K-mer set created. [59m49.238s] Skipping step 2, no database reduction requested. Sorting k-mer set (step 3 of 6)... K-mer set sorted. [2h55m45.947s] Creating seqID to taxID map (step 4 of 6).. 88481 sequences mapped to taxa. [3m51.977s] Creating taxDB (step 5 of 6)... taxDB construction finished. [33.904s] Building KrakenUniq LCA database (step 6 of 6)... Adding taxonomy IDs for sequences Adding taxonomy IDs for genomes

st.e

db_sort: Getting database into memory ...Loaded database with 32068135362 keys with k of 31 [val_len 4, key_len 8]. Loaded database with 32068135362 keys with k of 31 [val_len 4, key_len 8]. db_sort: Sorting ...db_sort: Sorting complete - writing database to disk ... Building taxonomy index from taxonomy//nodes.dmp and taxonomy//names.dmp. Done, got 2427426 taxa Reading taxonomy index from taxDB. Done. Getting database0.kdb into memory (358.389 GB) ... Done Loaded database with 32068135362 keys with k of 31 [val_len 4, key_len 8]. xargs: cat: terminated by signal 13 /mydir/krakenuniq-0.7.3/krakenuniq/build_db.sh: line 313: 3787 Segmentation fault (core dumped) set_lcas $MEMFLAG -x -d $SORTED_DB_NAME -o database.kdb -i database.idx -v -b taxDB $PARAM $PARAM1 -t $KRAKEN_THREAD_CT -m seqid2taxid.map $DC -F <( cat_library ) -T > seqid2taxid-plus.map

files generated

-rw-rw-r-- 1 384817625416 Jun 25 21:52 database0.kdb -rw-rw-r-- 1 294 Jun 25 18:57 database-build.log -rw-rw-r-- 1 8589934608 Jun 25 19:26 database.idx -rw-rw-r-- 1 384817625416 Jun 25 18:57 database.jdb drwxrwxr-x 7 4096 Jun 25 16:33 library -rw-rw-r-- 1 5138393 Jun 25 17:56 library-files.txt -rw-rw-r-- 1 6274702 Jun 25 21:57 seqid2taxid.map -rw-rw-r-- 1 0 Jun 25 21:57 seqid2taxid-plus.map -rw-rw-r-- 1 120834750 Jun 25 21:57 taxDB -rw-rw-r-- 1 120834750 Jun 25 21:57 taxDB.orig drwxrwxr-x 2 4096 Jun 25 16:33 taxonomy

CuypersBart avatar Jun 25 '22 20:06 CuypersBart

Hi, thank you for reporting this bug. This looks like a failure in set_lcas. I would like to reproduce/debug -- how can I reproduce this failure?

alekseyzimin avatar Jun 28 '22 21:06 alekseyzimin

Thank you for looking into this!

This is the code I used on our cluster (with 700GB RAM and 36 cores): `ml BLAST+/2.12.0-gompi-2021b ml Perl/5.34.0-GCCcore-11.2.0 cd /scratch/xx/krakenuniq

/scratch/xx/krakenuniq-0.7.3/krakenuniq/krakenuniq-download --rsync --db ./DB taxonomy /scratch/xx/krakenuniq-0.7.3/krakenuniq/krakenuniq-download --rsync --db ./DB --threads 36 --dust refseq/bacteria refseq/archaea /scratch/xx/vsc20223/krakenuniq-0.7.3/krakenuniq/krakenuniq-download ---rsync -db ./DB --threads 36 --dust refseq/vertebrate_mammalian/Chromosome/species_taxid=9606 /scratch/xx/vsc20223/krakenuniq-0.7.3/krakenuniq/krakenuniq-download --rsync --db ./DB --threads 36 --dust refseq/viral/Any viral-neighbors /scratch/xx/krakenuniq-0.7.3/krakenuniq/krakenuniq-download --rsync --db ./DB --threads 36 --dust refseq/protozoa/Chromosome /scratch/xx/krakenuniq-0.7.3/krakenuniq/krakenuniq-download --rsync --db ./DB --threads 36 --dust refseq/fungi/Chromosome

/scratch/xx/krakenuniq-0.7.3/krakenuniq/krakenuniq-build --db ./DB --kmer-len 31 --threads 36 --taxids-for-genomes --taxids-for-sequences`

CuypersBart avatar Jun 29 '22 12:06 CuypersBart

@CuypersBart @alekseyzimin When compile krakenuniq with gcc 11, I got the same error. re-compile with GCC 8.5.4. set_lcas works.

jameslz avatar Jul 26 '22 18:07 jameslz