sangeranalyseR icon indicating copy to clipboard operation
sangeranalyseR copied to clipboard

can sangeranalyseR handle spurious indels?

Open joelnitta opened this issue 4 years ago • 1 comments

This is difficult to provide a reprex as I don't want to just paste my sequences here. But I am wondering if sangeranalyseR is the appropriate tool when there may be spurious indels in the raw reads. I am sequencing a single coding gene (ca. 1300 bp) with four primers (a pair of sequencing primers and a pair of internal primers) with no introns, so the consenus should just be a single coding region with no indels. I am trying to use a reference AA sequence (refAminoAcidSeq), but I still have problems with obviously wrong frame shifts: portions of the contig where things are off by a single base, resulting in a completely jumbled consenus. I know that sangeranalyseR can't be used to edit the reads, but I'm surprised it apparently can't handle these sorts of sequencing errors, which are very common in my experience. Or it could just be my settings? Please let me know if perhaps I should share a few example reads via email or such.

joelnitta avatar Apr 13 '20 06:04 joelnitta

hi @joelnitta,

If you could share an example with us via email (whatever the smallest example that doesn't work how you want it, but seems like it should be something that we could implement). We'll take a look.

I agree that in principle this doesn't seem like it should be too hard. Simple matching a sequence up with indels to a back-translated AA reference sequence should fix the problem, right? It's a while since I wrote in the refAminoAcidSeq feature, so I'll have to dig into what it's doing. Then we can see if we can improve it.

Rob

roblanf avatar Apr 13 '20 23:04 roblanf