parasail
parasail copied to clipboard
SImple DNA search
I haven't been able to get a simple DNA search to produce a result. I am sure I am doing something silly. I couldn't find a simple invocation example for a DNA search in the docs....so:
./parasail_aligner -d -a sw_striped_sse41_128_64 -f human-1mb.fa -q aluSx1.fa funcname: sw_striped_sse41_128_64 cutoff: 7 use filter: yes gap_extend: 1 gap_open: 10 matrix: (null) AOL: 80 SIM: 40 OS: 30 file: human-1mb.fa query: aluSx1.fa output: parasail.csv read and pack time: 0.0121 seconds sentinal: $ end of packed buffer: 1000314 number of sequences: 2 number of queries: 1 number of db seqs: 1 induced SA time: 0.2240 seconds naive BWT time: 0.0055 seconds clamp LCP time: 0.0119 seconds ESA time: 2.5268 seconds possible pairs: 1 generated pairs: 0 unique pairs: 0 omp num threads: 20 pairs and vpairs were empty
This 1MB of sequence contains many Alu sequences including the top match found by cross_match:
2538 2.88 0.32 0.64 AluSx1#SINE/Alu 1 312 (0) C Human (930660) 69340 69030 C AluSx1#SINE/Alu 312 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGT 263 Human 69030 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGT 69079 C AluSx1#SINE/Alu 262 CGCCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAACCTCCGC 213 v Human 69080 CGCCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAAGCTCCGC 69129 C AluSx1#SINE/Alu 212 CTCCCGGGTTCAAGCGATTCTCCTGCCTCAGCCTCCCGAGTAGCTGGGAT 163 v v i Human 69130 CTCCCGGGTTCACGCCATTCTCCTGCCTCAGCCTCCCGAGTAGCTGGGAC 69179 C AluSx1#SINE/Alu 162 TACAGGCGCCCGCCACCACGCCCGGCTAATTTTT-GTATTTTTAGTAGAG 114 i - Human 69180 TACAGGCGCCCGCTACCACGCCCGGCTAATTTTTTGTATTTTTAGTAGAG 69229 C AluSx1#SINE/Alu 113 ACGGGGTTTCACCATGTTGGCCAGGCTGGTCTCGAACTCCTGACCTCAGG 64 i i v v -- Human 69230 ACGGGGTTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTC--G 69277 C AluSx1#SINE/Alu 63 TGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCC 14 Human 69278 TGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCC 69327 C AluSx1#SINE/Alu 13 ACCGCGCCCGGCC 1 Human 69328 ACCGCGCCCGGCC 69340
Thanks for trying out parasail. I admit that DNA wasn't the first goal of parasail, but no doubt it should work. If there's a bug, I'll fix it -- thanks for filing an issue!
For starters, it looks like you are running the parasail_aligner application correctly, though you may have missed on command-line parameter. The '-x' parameter turns off the use of a suffix array filter. The filter tries to find pairs of sequences with long exact-matching substrings coming from your input dataset and optional query dataset. I would try running the aligner and turn off the filter, just to be sure.
The other concern I have is that the aligner is reporting one sequence in your input dataset and one sequence in your query dataset. Is this correct? It is a completely valid way to use the tool, I just want to make sure that is indeed what you gave to it as inputs.
Is there any way you could make your input files available to me for my own testing? I really want to get this working for you.
I re-ran with the "-x" flag and got one alignment out:
time ./parasail_aligner -d -x -a sw_striped_sse41_128_64 -f human-1mb.fa -q aluSx1.fa -g foo.out funcname: sw_striped_sse41_128_64 cutoff: 7 use filter: no gap_extend: 1 gap_open: 10 matrix: (null) AOL: 80 SIM: 40 OS: 30 file: human-1mb.fa query: aluSx1.fa output: foo.out read and pack time: 0.0113 seconds sentinal: $ end of packed buffer: 1000314 number of sequences: 2 number of queries: 1 number of db seqs: 1 enumerate time: 0.0000 seconds unique pairs: 0 omp num threads: 20 openmp prep time: 0.0000 seconds work: 312000000 cells alignment time: 12.0690 seconds gcups: 0.0259 cat foo.out 0,0,312,1000000,275,309,482054
I'll drop you an email with links to the files.
-R