feature request: support for indels in adapter sequences
In testing Scythe on my RNA-seq library, I have come across a number of reads that still have a fragment of adapter on the 3'end, but that Scythe seemed to have missed due to a single base deletion in the read.
Example:
read = [...] GAACTTCCTGTGAAATACTTTGACGTGTCAGTCCTTCC[end]
adapter = GTGTCAGTCACTTCCAGC
^
Quality = CCABCDDDCD@>C@CCCDDDEDCDB8<ABCC>CCC@C9
(Illumina 1.8+)
The reads I have come across that look like this are otherwise good, and align well except for the lingering 3' adapter.
Unless I'm missing a parameter, this seems to be something that Scythe could/should support in the future. Support for miscalled/unmatched bases in the 3' adapter is critical, and I applaud Scythe for leading the way in taking a probabilistic approach to ID-ing and removing these reads.
However, there is clearly a certain amount of probability that a base will have been inserted (or deleted) into the adapter sequence as well, which could also be accounted for somehow.