parasail icon indicating copy to clipboard operation
parasail copied to clipboard

SImple DNA search

Open rmhubley opened this issue 8 years ago • 2 comments

I haven't been able to get a simple DNA search to produce a result. I am sure I am doing something silly. I couldn't find a simple invocation example for a DNA search in the docs....so:

./parasail_aligner -d -a sw_striped_sse41_128_64 -f human-1mb.fa -q aluSx1.fa 
            funcname: sw_striped_sse41_128_64
              cutoff: 7
          use filter: yes
          gap_extend: 1
            gap_open: 10
              matrix: (null)
                 AOL: 80
                 SIM: 40
                  OS: 30
                file: human-1mb.fa
               query: aluSx1.fa
              output: parasail.csv
  read and pack time: 0.0121 seconds
            sentinal: $
end of packed buffer: 1000314
 number of sequences: 2
   number of queries: 1
   number of db seqs: 1
     induced SA time: 0.2240 seconds
      naive BWT time: 0.0055 seconds
      clamp LCP time: 0.0119 seconds
            ESA time: 2.5268 seconds
      possible pairs: 1
     generated pairs: 0
        unique pairs: 0
     omp num threads: 20
pairs and vpairs were empty

This 1MB of sequence contains many Alu sequences including the top match found by cross_match:

2538  2.88 0.32 0.64  AluSx1#SINE/Alu        1   312 (0)  C Human   (930660) 69340 69030  

C AluSx1#SINE/Alu       312 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGT 263
                                                                              
  Human               69030 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGT 69079

C AluSx1#SINE/Alu       262 CGCCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAACCTCCGC 213
                                                                       v      
  Human               69080 CGCCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAAGCTCCGC 69129

C AluSx1#SINE/Alu       212 CTCCCGGGTTCAAGCGATTCTCCTGCCTCAGCCTCCCGAGTAGCTGGGAT 163
                                        v  v                                 i
  Human               69130 CTCCCGGGTTCACGCCATTCTCCTGCCTCAGCCTCCCGAGTAGCTGGGAC 69179

C AluSx1#SINE/Alu       162 TACAGGCGCCCGCCACCACGCCCGGCTAATTTTT-GTATTTTTAGTAGAG 114
                                         i                    -               
  Human               69180 TACAGGCGCCCGCTACCACGCCCGGCTAATTTTTTGTATTTTTAGTAGAG 69229

C AluSx1#SINE/Alu       113 ACGGGGTTTCACCATGTTGGCCAGGCTGGTCTCGAACTCCTGACCTCAGG 64
                                         i    i      v         v           -- 
  Human               69230 ACGGGGTTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTC--G 69277

C AluSx1#SINE/Alu        63 TGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCC 14
                                                                              
  Human               69278 TGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCC 69327

C AluSx1#SINE/Alu        13 ACCGCGCCCGGCC 1
                                         
  Human               69328 ACCGCGCCCGGCC 69340

rmhubley avatar Aug 03 '16 21:08 rmhubley

Thanks for trying out parasail. I admit that DNA wasn't the first goal of parasail, but no doubt it should work. If there's a bug, I'll fix it -- thanks for filing an issue!

For starters, it looks like you are running the parasail_aligner application correctly, though you may have missed on command-line parameter. The '-x' parameter turns off the use of a suffix array filter. The filter tries to find pairs of sequences with long exact-matching substrings coming from your input dataset and optional query dataset. I would try running the aligner and turn off the filter, just to be sure.

The other concern I have is that the aligner is reporting one sequence in your input dataset and one sequence in your query dataset. Is this correct? It is a completely valid way to use the tool, I just want to make sure that is indeed what you gave to it as inputs.

Is there any way you could make your input files available to me for my own testing? I really want to get this working for you.

jeffdaily avatar Aug 03 '16 22:08 jeffdaily

I re-ran with the "-x" flag and got one alignment out:

time ./parasail_aligner -d -x -a sw_striped_sse41_128_64 -f human-1mb.fa -q aluSx1.fa -g foo.out funcname: sw_striped_sse41_128_64 cutoff: 7 use filter: no gap_extend: 1 gap_open: 10 matrix: (null) AOL: 80 SIM: 40 OS: 30 file: human-1mb.fa query: aluSx1.fa output: foo.out read and pack time: 0.0113 seconds sentinal: $ end of packed buffer: 1000314 number of sequences: 2 number of queries: 1 number of db seqs: 1 enumerate time: 0.0000 seconds unique pairs: 0 omp num threads: 20 openmp prep time: 0.0000 seconds work: 312000000 cells alignment time: 12.0690 seconds gcups: 0.0259 cat foo.out 0,0,312,1000000,275,309,482054

I'll drop you an email with links to the files.

-R

rmhubley avatar Aug 03 '16 22:08 rmhubley