conterminator icon indicating copy to clipboard operation
conterminator copied to clipboard

conterminator frozen at stage rescorediagonal

Open tougai opened this issue 3 years ago • 8 comments

hi, i am trying to test conterminator on a very simple file to start, but it freezes at rescorediagonal stage. When i use example files dna.fas and dna.mapping, everything is fine !

here is my fasta file toto.fa:

>chr1
TCATGGCTATTTTCATAAAAAATGGGGGTTGTGTGGCCATTTATCATCGACTAGAGGCTC
ATAAACCTCACCCCACATATGTTTCCTTGCCATAGATTACATTCTTGGATTTCTGGTGGA
AACCATTTCTTGCTTAAAAACTCGTACGTGTTAGCCTTCGGTATTATTGAAAATGGTCAT
TCATGGCTATTTTTCGGCAAAATGGGGGTTGTGTGGCCATTGATCGTCGACCAGAGGCTC

my mapping file toto.fa.taxidmapping: chr1 4577

my command line: conterminator dna toto.fa toto.fa.taxidmapping out tmp

and the log:

Tmp tmp folder does not exist or is not a directory.
Create dir tmp
dna toto.fa toto.fa.taxidmapping out tmp

MMseqs Version:                         570993be7f5f31ee357183c9118bf3aa75575870
Substitution matrix                     nucl:nucleotide.out,aa:blosum62.out
Add backtrace                           true
Alignment mode                          3
Allow wrapped scoring                   false
E-value threshold                       0.001
Seq. id. threshold                      0.9
Min. alignment length                   100
Seq. id. mode                           0
Alternative alignments                  0
Coverage threshold                      0
Coverage mode                           0
Max sequence length                     1000
Compositional bias                      0
Realign hits                            false
Max reject                              2147483647
Max accept                              2147483647
Include identical seq. id.              false
Preload mode                            0
Pseudo count a                          1
Pseudo count b                          1.5
Score bias                              0
Gap open cost                           5
Gap extension cost                      2
Threads                                 24
Compressed                              0
Verbosity                               3
Seed substitution matrix                nucl:nucleotide.out,aa:VTML80.out
Sensitivity                             5.7
K-mer size                              15
K-score                                 2147483647
Alphabet size                           21
Split database                          0
Split mode                              2
Split memory limit                      0
Diagonal scoring                        false
Exact k-mer matching                    1
Mask residues                           0
Mask lower case residues                0
Minimum diagonal score                  25
Spaced k-mers                           1
Spaced k-mer pattern
Local temporary path
Rescore mode                            2
Remove hits by seq. id. and coverage    false
Sort results                            0
Mask profile                            1
Profile e-value threshold               0.001
Use global sequence weighting           false
Allow deletions                         false
Filter MSA                              1
Maximum seq. id. threshold              0.9
Minimum seq. id.                        0
Minimum score per column                -20
Minimum coverage                        0
Select N most diverse seqs              1000
Omit consensus                          false
Min codons in orf                       30
Max codons in length                    32734
Max orf gaps                            2147483647
Contig start mode                       2
Contig end mode                         2
Orf start mode                          1
Forward frames                          1
Reverse frames                          1
Translation table                       1
Translate orf                           0
Use all table starts                    false
Offset of numeric ids                   0
Create lookup                           0
Add orf stop                            false
Chain overlapping alignments            0
Merge query                             1
Search type                             0
Number search iterations                1
Start sensitivity                       4
Search steps                            1
Run a seq-profile search in slice mode  false
Strand selection                        2
Disk space limit                        0
MPI runner
Force restart with latest tmp           false
Remove temporary files                  true
Database type                           0
Shuffle input database                  true
Createdb mode                           0
NCBI tax dump directory
Taxonomical mapping file
Blacklisted taxa                        10239,12908,28384,81077,11632,340016,61964,48479,48510
Compare across kingdoms                 (2||2157),4751,33208,33090,(2759&&!4751&&!33208&&!33090)

createdb toto.fa tmp/6246057436143434068/sequencedb

Converting sequences

Time for merging to sequencedb_h: 0h 0m 0s 116ms
Time for merging to sequencedb: 0h 0m 0s 115ms
Database type: Nucleotide
Time for merging to sequencedb.lookup: 0h 0m 0s 1ms
Time for processing: 0h 0m 0s 438ms
Tmp tmp/6246057436143434068/createtaxdb folder does not exist or is not a directory.
Create dir tmp/6246057436143434068/createtaxdb
createtaxdb tmp/6246057436143434068/sequencedb tmp/6246057436143434068/createtaxdb --tax-mapping-file toto.fa.taxidmapping -v 3

Download taxdump.tar.gz
2021-06-17 17:21:59 URL:https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz [55403423/55403423] -> "-" [1]
Database created
Remove temporary files
tmp/6246057436143434068/createtaxdb/createindex.sh: line 58: [: : integer expression expected
splitsequence tmp/6246057436143434068/sequencedb tmp/6246057436143434068/db_rev_split --max-seq-len 1000 --sequence-overlap 0 --sequence-split-mode 1 --create-lookup 0 --threads 24 --compressed 1 -v 3

Time for processing: 0h 0m 0s 37ms
kmermatcher tmp/6246057436143434068/db_rev_split tmp/6246057436143434068/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size 21 --min-seq-id 0.9 --kmer-per-seq 100 --spaced-kmer-mode 1 --kmer-per-seq-scale 0 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 0 -k 24 -c 0 --max-seq-len 1000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 24 --compressed 0 -v 3

kmermatcher tmp/6246057436143434068/db_rev_split tmp/6246057436143434068/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size 21 --min-seq-id 0.9 --kmer-per-seq 100 --spaced-kmer-mode 1 --kmer-per-seq-scale 0 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 0 -k 24 -c 0 --max-seq-len 1000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 24 --compressed 0 -v 3

Database size: 1 type: Nucleotide

Generate k-mers list for 1 split
[=================================================================] 100.00% 1 eta -

Adjusted k-mer length 24
Sort kmer 0h 0m 0s 0ms
Sort by rep. sequence 0h 0m 0s 0ms
Time for fill: 0h 0m 0s 1ms
Time for merging to pref: 0h 0m 0s 3ms
Time for processing: 0h 0m 0s 27ms
tmp/6246057436143434068/pref exists and will be overwritten.
crosstaxonfilterorf tmp/6246057436143434068/sequencedb tmp/6246057436143434068/db_rev_split_h tmp/6246057436143434068/pref tmp/6246057436143434068/pref_cross --blacklist 10239,12908,28384,81077,11632,340016,61964,48479,48510 --kingdoms (2||2157),4751,33208,33090,(2759&&!4751&&!33208&&!33090) --threads 24 -v 3

Loading NCBI taxonomy
Loading nodes file ... Done, got 2337439 nodes
Loading merged file ... Done, added 63224 merged nodes.
Loading names file ... Done
Making matrix ... Done
Init RMQ ...Done
[=================================================================] 100.00% 1 eta -
Time for merging to pref_cross: 0h 0m 0s 41ms
Time for processing: 0h 0m 6s 156ms
rescorediagonal tmp/6246057436143434068/db_rev_split tmp/6246057436143434068/db_rev_split tmp/6246057436143434068/pref_cross tmp/6246057436143434068/aln --sub-mat nucl:nucleotide.out,aa:blosum62.out --rescore-mode 2 --wrapped-scoring 0 --filter-hits 0 -e 0.001 -c 0 -a 1 --cov-mode 0 --min-seq-id 0.9 --min-aln-len 100 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 24 --compressed 0 -v 3

tougai avatar Jun 17 '21 15:06 tougai