scythe icon indicating copy to clipboard operation
scythe copied to clipboard

feature request: support for indels in adapter sequences

Open crmackay opened this issue 11 years ago • 0 comments

In testing Scythe on my RNA-seq library, I have come across a number of reads that still have a fragment of adapter on the 3'end, but that Scythe seemed to have missed due to a single base deletion in the read.

Example:


read = [...] GAACTTCCTGTGAAATACTTTGACGTGTCAGTCCTTCC[end]
adapter =                            GTGTCAGTCACTTCCAGC
                                              ^

Quality =    CCABCDDDCD@>C@CCCDDDEDCDB8<ABCC>CCC@C9
(Illumina 1.8+)

The reads I have come across that look like this are otherwise good, and align well except for the lingering 3' adapter.

Unless I'm missing a parameter, this seems to be something that Scythe could/should support in the future. Support for miscalled/unmatched bases in the 3' adapter is critical, and I applaud Scythe for leading the way in taking a probabilistic approach to ID-ing and removing these reads.

However, there is clearly a certain amount of probability that a base will have been inserted (or deleted) into the adapter sequence as well, which could also be accounted for somehow.

crmackay avatar Aug 15 '14 17:08 crmackay