Weird behavior
Hi,
We are using scythe to trim 3' adapter but we found a very weird behavior using this sequence (in.fq):
@014_1000001169_x1 AAAAAAGATGCCAGTTGAAGAACTGATGGAATTCTCGGGTGCCAAAGAACTAAAG +014_1000001169_x1 BBBB>>1111B1B1BBBBF1BF1BB1B11BBBBAD3A00A0BBDB00BB0D1AB1
and adapter fasta file (adapt.fa):
RPI10 TGGAATTCTCGGGTGCCAAGGAACTCCAGTCACTAGCTTATCTCGTATGCCGTCTTCTGCTTG
The result of this command : scythe -o out.fq -m match.txt -a adapt.fa in.fq
is this fastq file (out.fq):
@014_1000001169_x1 N + B
Why the scythe trims all the read ??? The match file (match.txt) content is :
p(c|s): 1.000000; p(!c|s): 0.000000; adapter: RPI10 014_1000001169_x1 TGGAATTCTCGGGTGCCAAGGAACTCCAG ||||||||||||||||||| ||||| || TGGAATTCTCGGGTGCCAAAGAACTAAAG B11BBBBAD3A00A0BBDB00BB0D1AB1 [1.00, 0.97, 0.97, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.98, 1.00, 0.97, 0.97, 1.00, 0.97, 1.00, 1.00, 1.00, 1.00, 0.97, 0.97, 1.00, 1.00, 0.97, 1.00, 0.97, 1.00, 1.00, 0.97]
So it should trim only the 3' region like this (according to the match region):
@014_1000001169_x1 AAAAAAGATGCCAGTTGAAGAACTGA +014_1000001169_x1 BBBB>>1111B1B1BBBBF1BF1BB1
Hi Daniel,
Thanks for reporting this — I'll take a look. I think you had emailed me regarding this earlier; my sincerest apologies for my delayed response (I've been slammed with work lately!). I'll try to take a look at this issue this week.
Daniel,
You probably want to reduce the -M flag. The correctly trimmed fragment you show above is of length 26, and scythe by default only keeps reads longer than 30 bp. If you want to keep all, use -M 1.
for me, scythe -o out.fq -m match.txt -M 1 -a adapt.fa in.fq gives the following for out.fq:
@014_1000001169_x1
AAAAAAGATGCCAGTTGAAGAACTGA
+
BBBB>>1111B1B1BBBBF1BF1BB1
Hope that helps,
Cheers, Kevin
My bad, make that 35bp by default :smile:
Hi Vincent,
We have used scythe in our analysis but from this event I am concerned about using it. Hopefully you can help us.
Thanks,
Daniel
2014-10-28 3:50 GMT-02:00 Vince Buffalo [email protected]:
Hi Daniel,
Thanks for reporting this — I'll take a look. I think you had emailed me regarding this earlier; my sincerest apologies for my delayed response (I've been slammed with work lately!). I'll try to take a look at this issue this week.
— Reply to this email directly or view it on GitHub https://github.com/vsbuffalo/scythe/issues/25#issuecomment-60712794.
Daniel Guariz Pinheiro Professor Assistente Doutor (FCAV/Unesp)
Daniel,
Please see @kdmurray91's comments — he is correct, this is not a bug. Your match is 29 bases long, -M is 35bp.
Oh...
Ok... I see the @kdmurray91 https://github.com/kdmurray91's comments. Now, I understand, the "-M" option causes the reduction of this sequence to only one base. Sorry, I had thought that these kind of sequence (using -M option) would be removed from the output file.
Thanks,
Daniel
2014-10-28 16:00 GMT-02:00 Vince Buffalo [email protected]:
Daniel,
Please see @kdmurray91 https://github.com/kdmurray91's comments — he is correct, this is not a bug. Your match is 29 bases long, -M is 35bp.
— Reply to this email directly or view it on GitHub https://github.com/vsbuffalo/scythe/issues/25#issuecomment-60801818.
Daniel Guariz Pinheiro Professor Assistente Doutor (FCAV/Unesp)