MMseqs2 linsearch

linsearch

Open CaroleBelliardo opened this issue 5 years ago • 2 comments

Expected Behavior

I try to use easy-linsearch and linsearch but both have same issue : empty files First, I have try easy-linsearch with the line : "$ mmseqs easy-linsearch /work/cbelliardo/6-ensembl_clust/metag_G-rosea.fa /bighub/hub/DB/mmseq_swissprot/swissprot out tmp --search-type 2 -v 3 --threads 8" but the output ( out file ) is empty without error message.

Then i have try whit linsearch cmd. $ mmseqs createdb metag_G-rosea.fa queryDB $ mmseqs linsearch queryDB /bighub/hub/DB/mmseq_swissprot/swissprot resultDB tmp $ mmseqs convertalis queryDB /bighub/hub/DB/mmseq_swissprot/swissprot resultDB resultDB.m8 I have the same issue.

The metag_G-rosea.fa file is a fasta file with 80 charactere by line. I have try with search and it's work really well. so, the file seems to be ok.

MMseqs Output (for bugs)

createdb metag_G-rosea.fa queryDB

MMseqs Version: 10.6d92c Max sequence length 65535 Split seq. by length true Database type 0 Do not shuffle input database true Offset of numeric ids 0 Compressed 0 Verbosity 3

Assuming DNA database, forcing parameter --dont-split-seq-by-len true Converting sequences [ Time for merging into queryDB_h by mergeResults: 0h 0m 0s 107ms Time for merging into queryDB by mergeResults: 0h 0m 0s 116ms Time for merging into queryDB.lookup by mergeResults: 0h 0m 0s 5ms Time for processing: 0h 0m 0s 479ms Tmp tmp folder does not exist or is not a directory. extractorfs queryDB tmp/2730103712073724212/q_orfs_aa --min-length 30 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 1 --use-all-table-starts 0 --id-offset 0 --threads 64 --compressed 0 -v 3

[=================================================================] 101 0s 57ms Time for merging into tmp/2730103712073724212/q_orfs_aa_h by mergeResults: 0h 0m 0s 174ms Time for merging into tmp/2730103712073724212/q_orfs_aa by mergeResults: 0h 0m 0s 182ms Time for processing: 0h 0m 0s 869ms kmersearch tmp/2730103712073724212/q_orfs_aa /bighub/hub/DB/mmseq_swissprot/swissprot.linidx tmp/2730103712073724212/search/pref --seed-sub-mat nucl:nucleotide.out,aa:blosum62.out --kmer-per-seq 21 --mask 0 --ma sk-lower-case 0 --cov-mode 0 -c 0 --max-seq-len 65535 --pick-n-sim-kmer 1 --split-memory-limit 0 --threads 64 --compressed 0 -v 3

kmersearch tmp/2730103712073724212/q_orfs_aa /bighub/hub/DB/mmseq_swissprot/swissprot.linidx tmp/2730103712073724212/search/pref --seed-sub-mat nucl:nucleotide.out,aa:blosum62.out --kmer-per-seq 21 --mask 0 --ma sk-lower-case 0 --cov-mode 0 -c 0 --max-seq-len 65535 --pick-n-sim-kmer 1 --split-memory-limit 0 --threads 64 --compressed 0 -v 3

Estimated memory consumption 4 MB Reduced amino acid alphabet: (A S T) (C) (D B N) (E Q Z) (F Y) (G) (H) (I V) (K R) (L J M) (P) (W) (X) Process file into 1 parts Generate k-mers list 0 [=================================================================] 14.93K 0s 32ms

Time for fill: 0h 0m 0s 35ms Sort kmer ... Time for sort: 0h 0m 0s 39ms Time to find k-mers: 0h 0m 0s 608ms Time to sort: 0h 0m 0s 0ms Time for merging into tmp/2730103712073724212/search/pref by mergeResults: 0h 0m 0s 5ms Time for processing: 0h 0m 0s 729ms rescorediagonal /bighub/hub/DB/mmseq_swissprot/swissprot.linidx tmp/2730103712073724212/q_orfs_aa tmp/2730103712073724212/search/pref tmp/2730103712073724212/search/reverse_ungapaln --sub-mat nucl:nucleotide.out ,aa:blosum62.out --rescore-mode 2 --filter-hits 0 -e 0.001 -c 0.9 -a 0 --cov-mode 1 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 64 --compressed 0 -v 3

Index version: 15 Generated by: 10.6d92c ScoreMatrix: : [=================================================================] 323 0s 253ms Time for merging into tmp/2730103712073724212/search/reverse_ungapaln by mergeResults: 0h 0m 0s 177ms Time for processing: 0h 0m 0s 855ms filterdb tmp/2730103712073724212/search/pref tmp/2730103712073724212/search/pref_filter --filter-file tmp/2730103712073724212/search/reverse_ungapaln --positive-filter 0

Filtering with filter files. [=================================================================] 323 0s 12ms Time for merging into tmp/2730103712073724212/search/pref_filter by mergeResults: 0h 0m 0s 171ms Time for processing: 0h 0m 0s 397ms align /bighub/hub/DB/mmseq_swissprot/swissprot.linidx tmp/2730103712073724212/q_orfs_aa tmp/2730103712073724212/search/pref_filter tmp/2730103712073724212/search/reverse_aln --sub-mat nucl:nucleotide.out,aa:blos um62.out -a 0 --alignment-mode 2 -e 100000 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --realign 0 --max-rejected 2147483647 --max-accept 2 147483647 --add-self-matches 0 --db-load-mode 0 --pca 1 --pcb 1.5 --score-bias 0 --gap-open 11 --gap-extend 1 --threads 64 --compressed 0 -v 3

Index version: 15 Generated by: 10.6d92c ScoreMatrix: : Compute score and coverage Query database size: 561568 type: Aminoacid Target database size: 14926 type: Aminoacid Calculation of alignments [=================================================================] 323 0s 62ms Time for merging into tmp/2730103712073724212/search/reverse_aln by mergeResults: 0h 0m 0s 109ms

335 alignments calculated. 323 sequence pairs passed the thresholds (0.964179 of overall calculated). 1.000000 hits per query sequence. Time for processing: 0h 0m 0s 427ms swapresults /bighub/hub/DB/mmseq_swissprot/swissprot.linidx tmp/2730103712073724212/q_orfs_aa tmp/2730103712073724212/search/reverse_aln tmp/2730103712073724212/aln --sub-mat nucl:nucleotide.out,aa:blosum62.out -e 0.001 --split-memory-limit 0 --gap-open 11 --gap-extend 1 --threads 64 --compressed 0 --db-load-mode 0 -v 3

Index version: 15 Generated by: 10.6d92c ScoreMatrix: : Computing offsets. [=================================================================] 323 0s 3ms

Reading results. [=================================================================] 323 0s 0ms

Output database: tmp/2730103712073724212/aln [=================================================================] 14.93K 0s 154ms

Time for merging into tmp/2730103712073724212/aln by mergeResults: 0h 0m 0s 147ms Time for processing: 0h 0m 0s 394ms offsetalignment queryDB tmp/2730103712073724212/q_orfs_aa /bighub/hub/DB/mmseq_swissprot/swissprot.linidx /bighub/hub/DB/mmseq_swissprot/swissprot.linidx tmp/2730103712073724212/aln resultDB --chain-alignments 0 --merge-query 1 --search-type 0 --threads 64 --compressed 0 --db-load-mode 0 -v 3 Index version: 15 Generated by: 10.6d92c ScoreMatrix: : Computing ORF lookup Computing contig offsets Computing contig lookup Time for contig lookup: 0h 0m 0s 2ms Writing results to: resultDB ==[===============================================================] 101 0s 3ms

Time for merging into resultDB by mergeResults: 0h 0m 0s 169ms Time for processing: 0h 0m 1s 64ms convertalis queryDB /bighub/hub/DB/mmseq_swissprot/swissprot resultDB resultDB.m8

MMseqs Version: 10.6d92c Substitution matrix nucl:nucleotide.out,aa:blosum62.out Alignment format 0 Format alignment output query,target,pident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits Translation table 1 Gap open cost 11 Gap extension cost 1 Database output false Preload mode 0 Search type 0 Threads 64 Compressed 0 Verbosity 3

[=================================================================] 101 0s 8ms Time for merging into resultDB.m8 by mergeResults: 0h 0m 0s 160ms Time for processing: 0h 0m 1s 306ms

Context

There is no informations in the manual about this cmd, maybe i miss up some steps

Your Environment

I use the version 10.6d92c install with conda

Server specifications 512 GO RAM; 64 CPU
Operating system and version: linux last release

Thanks a lot for your help !

Feb 11 '20 16:02 CaroleBelliardo

MMseqs2 MMseqs2 copied to clipboard

linsearch

Expected Behavior

MMseqs Output (for bugs)

Context

Your Environment

MMseqs2
MMseqs2 copied to clipboard