conterminator icon indicating copy to clipboard operation
conterminator copied to clipboard

Segmentation fault (core dumped). Error: crosstaxonfilterorf step died

Open keishaboateng97 opened this issue 5 months ago • 0 comments

Dear @martin-steinegger
I am trying to run Conterminator, but I keep getting an error saying "crosstaxonfilterorf step died". I have made my own mapping file, and I also have a gzipped sequence file. Do you have an idea about how I can fix this?

Best regards, Anna.

Below is the log:

Tmp tmp folder does not exist or is not a directory. Create dir tmp dna multifasta.txt db_seqs.mapping db_seqs tmp

MMseqs Version: 570993be7f5f31ee357183c9118bf3aa75575870 Substitution matrix nucl:nucleotide.out,aa:blosum62.out Add backtrace true Alignment mode 3 Allow wrapped scoring false E-value threshold 0.001 Seq. id. threshold 0.9 Min. alignment length 100 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0 Coverage mode 0 Max sequence length 1000 Compositional bias 0 Realign hits false Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a 1 Pseudo count b 1.5 Score bias 0 Gap open cost 5 Gap extension cost 2 Threads 24 Compressed 0 Verbosity 3 Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out Sensitivity 5.7 K-mer size 15 K-score 2147483647 Alphabet size 21 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring false Exact k-mer matching 1 Mask residues 0 Mask lower case residues 0 Minimum diagonal score 25 Spaced k-mers 1 Spaced k-mer pattern Local temporary path Rescore mode 2 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile e-value threshold 0.001 Use global sequence weighting false Allow deletions false Filter MSA 1 Maximum seq. id. threshold 0.9 Minimum seq. id. 0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Omit consensus false Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1 Reverse frames 1 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Chain overlapping alignments 0 Merge query 1 Search type 0 Number search iterations 1 Start sensitivity 4 Search steps 1 Run a seq-profile search in slice mode false Strand selection 2 Disk space limit 0 MPI runner Force restart with latest tmp false Remove temporary files true Database type 0 Shuffle input database true Createdb mode 0 NCBI tax dump directory Taxonomical mapping file Blacklisted taxa 10239,12908,28384,81077,11632,340016,61964,48479,48510 Compare across kingdoms (2||2157),4751,33208,33090,(2759&&!4751&&!33208&&!33090)

createdb multifasta.txt tmp/13966145965188563130/sequencedb

Converting sequences [11111] 0s 847ms Time for merging to sequencedb_h: 0h 0m 0s 130ms Time for merging to sequencedb: 0h 0m 4s 36ms Database type: Nucleotide Time for merging to sequencedb.lookup: 0h 0m 0s 3ms Time for processing: 0h 0m 8s 659ms Tmp tmp/13966145965188563130/createtaxdb folder does not exist or is not a directory. Create dir tmp/13966145965188563130/createtaxdb createtaxdb tmp/13966145965188563130/sequencedb tmp/13966145965188563130/createtaxdb --tax-mapping-file db_seqs.mapping -v 3

Download taxdump.tar.gz 2024-01-08 09:28:54 URL:https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz [64135145/64135145] -> "-" [1] Database created Remove temporary files tmp/13966145965188563130/createtaxdb/createindex.sh: 58: [: Illegal number: splitsequence tmp/13966145965188563130/sequencedb tmp/13966145965188563130/db_rev_split --max-seq-len 1000 --sequence-overlap 0 --sequence-split-mode 1 --create-lookup 0 --threads 24 --compressed 1 -v 3

Sequence split mode (--sequence-split-mode 0) and compressed (--compressed 1) can not be combined. [=================================================================] 100.00% 11.19K 0s 51ms eta - Time for merging to db_rev_split_h: 0h 0m 0s 332ms Time for merging to db_rev_split: 0h 0m 0s 331ms Time for processing: 0h 0m 1s 271ms kmermatcher tmp/13966145965188563130/db_rev_split tmp/13966145965188563130/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size 21 --min-seq-id 0.9 --kmer-per-seq 100 --spaced-kmer-mode 1 --kmer-per-seq-scale 0 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 0 -k 24 -c 0 --max-seq-len 1000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 24 --compressed 0 -v 3

kmermatcher tmp/13966145965188563130/db_rev_split tmp/13966145965188563130/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size 21 --min-seq-id 0.9 --kmer-per-seq 100 --spaced-kmer-mode 1 --kmer-per-seq-scale 0 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 0 -k 24 -c 0 --max-seq-len 1000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 24 --compressed 0 -v 3

Database size: 265435 type: Nucleotide

Generate k-mers list for 1 split [=================================================================] 100.00% 265.43K 3s 875ms

Adjusted k-mer length 24 Sort kmer 0h 0m 2s 851ms Sort by rep. sequence 0h 0m 1s 284ms Time for fill: 0h 0m 0s 584ms Time for merging to pref: 0h 0m 0s 209ms Time for processing: 0h 0m 10s 726ms tmp/13966145965188563130/pref exists and will be overwritten. crosstaxonfilterorf tmp/13966145965188563130/sequencedb tmp/13966145965188563130/db_rev_split_h tmp/13966145965188563130/pref tmp/13966145965188563130/pref_cross --blacklist 10239,12908,28384,81077,11632,340016,61964,48479,48510 --kingdoms (2||2157),4751,33208,33090,(2759&&!4751&&!33208&&!33090) --threads 24 -v 3

Loading NCBI taxonomy Loading nodes file ... Done, got 2550529 nodes Loading merged file ... Done, added 75736 merged nodes. Loading names file ... Done Making matrix ... Done Init RMQ ...Done Segmentation fault (core dumped) ] 0.00% 1 eta - Error: crosstaxonfilterorf step died s175562@node06:/home/projects2/keisha/data$ ^C s175562@node06:/home/projects2/keisha/data$ gzip multifasta.txt s175562@node06:/home/projects2/keisha/data$ ls CHECK genuslist.txt sci_name_taxid.txt taxontemp3.txt cleaned_scinamegenus.txt identifier.txt sequence_length_file.txt taxontemp.txt db_seqs.mapping mapping_file.tsv speciesprofile.txt taxon.txt downloads_ncbi meta.xml taxdump.tar.gz temp.txt downloads_ncbi2 multifasta.txt.gz taxidonly.txt tmp downloads_ncbi_tmp ncbi_taxid.txt tax_ids.txt WoRMS_download_2023-09-01.zip eml.xml output_lineage.txt taxontemp1.txt genuslist_tmp.txt sciname_genus.txt taxontemp2.txt s175562@node06:/home/projects2/keisha/data$ /home/ctools/conterminator/conterminator dna multifasta.txt.gz db_seqs.mapping db_seqs tmp dna multifasta.txt.gz db_seqs.mapping db_seqs tmp

MMseqs Version: 570993be7f5f31ee357183c9118bf3aa75575870 Substitution matrix nucl:nucleotide.out,aa:blosum62.out Add backtrace true Alignment mode 3 Allow wrapped scoring false E-value threshold 0.001 Seq. id. threshold 0.9 Min. alignment length 100 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0 Coverage mode 0 Max sequence length 1000 Compositional bias 0 Realign hits false Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a 1 Pseudo count b 1.5 Score bias 0 Gap open cost 5 Gap extension cost 2 Threads 24 Compressed 0 Verbosity 3 Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out Sensitivity 5.7 K-mer size 15 K-score 2147483647 Alphabet size 21 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring false Exact k-mer matching 1 Mask residues 0 Mask lower case residues 0 Minimum diagonal score 25 Spaced k-mers 1 Spaced k-mer pattern Local temporary path Rescore mode 2 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile e-value threshold 0.001 Use global sequence weighting false Allow deletions false Filter MSA 1 Maximum seq. id. threshold 0.9 Minimum seq. id. 0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Omit consensus false Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1 Reverse frames 1 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Chain overlapping alignments 0 Merge query 1 Search type 0 Number search iterations 1 Start sensitivity 4 Search steps 1 Run a seq-profile search in slice mode false Strand selection 2 Disk space limit 0 MPI runner Force restart with latest tmp false Remove temporary files true Database type 0 Shuffle input database true Createdb mode 0 NCBI tax dump directory Taxonomical mapping file Blacklisted taxa 10239,12908,28384,81077,11632,340016,61964,48479,48510 Compare across kingdoms (2||2157),4751,33208,33090,(2759&&!4751&&!33208&&!33090)

createdb multifasta.txt.gz tmp/13401688708221171541/sequencedb

Converting sequences [11111] 2s 414ms Time for merging to sequencedb_h: 0h 0m 0s 138ms Time for merging to sequencedb: 0h 0m 4s 206ms Database type: Nucleotide Time for merging to sequencedb.lookup: 0h 0m 0s 2ms Time for processing: 0h 0m 10s 426ms Tmp tmp/13401688708221171541/createtaxdb folder does not exist or is not a directory. Create dir tmp/13401688708221171541/createtaxdb createtaxdb tmp/13401688708221171541/sequencedb tmp/13401688708221171541/createtaxdb --tax-mapping-file db_seqs.mapping -v 3

Download taxdump.tar.gz 2024-01-08 09:47:06 URL:https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz [64135032/64135032] -> "-" [1] Database created Remove temporary files tmp/13401688708221171541/createtaxdb/createindex.sh: 58: [: Illegal number: splitsequence tmp/13401688708221171541/sequencedb tmp/13401688708221171541/db_rev_split --max-seq-len 1000 --sequence-overlap 0 --sequence-split-mode 1 --create-lookup 0 --threads 24 --compressed 1 -v 3

Sequence split mode (--sequence-split-mode 0) and compressed (--compressed 1) can not be combined. [=================================================================] 100.00% 11.19K 0s 54ms eta - Time for merging to db_rev_split_h: 0h 0m 0s 333ms Time for merging to db_rev_split: 0h 0m 0s 357ms Time for processing: 0h 0m 1s 304ms kmermatcher tmp/13401688708221171541/db_rev_split tmp/13401688708221171541/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size 21 --min-seq-id 0.9 --kmer-per-seq 100 --spaced-kmer-mode 1 --kmer-per-seq-scale 0 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 0 -k 24 -c 0 --max-seq-len 1000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 24 --compressed 0 -v 3

kmermatcher tmp/13401688708221171541/db_rev_split tmp/13401688708221171541/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size 21 --min-seq-id 0.9 --kmer-per-seq 100 --spaced-kmer-mode 1 --kmer-per-seq-scale 0 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 0 -k 24 -c 0 --max-seq-len 1000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 24 --compressed 0 -v 3

Database size: 265435 type: Nucleotide

Generate k-mers list for 1 split [=================================================================] 100.00% 265.43K 3s 547ms

Adjusted k-mer length 24 Sort kmer 0h 0m 1s 865ms Sort by rep. sequence 0h 0m 1s 499ms Time for fill: 0h 0m 0s 589ms Time for merging to pref: 0h 0m 0s 209ms Time for processing: 0h 0m 9s 545ms tmp/13401688708221171541/pref exists and will be overwritten. crosstaxonfilterorf tmp/13401688708221171541/sequencedb tmp/13401688708221171541/db_rev_split_h tmp/13401688708221171541/pref tmp/13401688708221171541/pref_cross --blacklist 10239,12908,28384,81077,11632,340016,61964,48479,48510 --kingdoms (2||2157),4751,33208,33090,(2759&&!4751&&!33208&&!33090) --threads 24 -v 3

Loading NCBI taxonomy Loading nodes file ... Done, got 2550529 nodes Loading merged file ... Done, added 75736 merged nodes. Loading names file ... Done Making matrix ... Done Init RMQ ...Done Segmentation fault (core dumped) ] 0.00% 1 eta - Error: crosstaxonfilterorf step died

keishaboateng97 avatar Jan 08 '24 08:01 keishaboateng97