Unexpected base in duplex call
Hello,
I extracted fastq from the duplex_orig.sam and compared output to the original raw reads.
In the following alignment file, The top sequence is the output for the duplex read. Below it are the two corresponding reads. Last read is reverse complemented.
The highlighted region shows 'A' in the duplex output while it shows 'C' in read1 and 'T' in reverse complement of read2.
My understanding is, for this locus, the duplex should show either C or T but not A.
Can you please share some insights?
Thanks
-Dev

Could it be an alignment issue in this view? The AT from the duplex fits with read2 revcomp a bit to the right. Without seeing more bases to the right, it seems here that the duplex read is missing 2 bases.
Yes, here are more sequences to the right:

Hi @dpaudel-tb,
The assumption that there is a trivial relationship between the simplex calls and the duplex calls is incorrect. The duplex caller is not formed trivially from the simplex calls.
Consider the inference of decoding a single simplex signal into a basecall. I can find the most likely basecall that explains the observed signal. I can do this for the second stand signal too. Those inference problems are independent (at least they are treated as such -- they are not informed by each other).
A duplex caller however is attempting to find a single basecall that explains both observed signals simultaneously. Although clearly in the limit of complete information all three calls would be identical, when variation is taken into account the calls can differ.
Thank you @cjw85 for your insights. Unfortunately, I do not have 'ground truth' data for these sequences so I am also not sure what the actual bases should be.