duplex-tools Unexpected base in duplex call

Hello, I extracted fastq from the duplex_orig.sam and compared output to the original raw reads. In the following alignment file, The top sequence is the output for the duplex read. Below it are the two corresponding reads. Last read is reverse complemented. The highlighted region shows 'A' in the duplex output while it shows 'C' in read1 and 'T' in reverse complement of read2. My understanding is, for this locus, the duplex should show either C or T but not A. Can you please share some insights? Thanks -Dev unknown_snp_duplex

Apr 27 '23 19:04 dpaudel-tb

Could it be an alignment issue in this view? The AT from the duplex fits with read2 revcomp a bit to the right. Without seeing more bases to the right, it seems here that the duplex read is missing 2 bases.

Apr 27 '23 20:04 HenrivdGeest

Yes, here are more sequences to the right: unknown_snp_duplex_longer

Apr 27 '23 20:04 dpaudel-tb

Hi @dpaudel-tb,

The assumption that there is a trivial relationship between the simplex calls and the duplex calls is incorrect. The duplex caller is not formed trivially from the simplex calls.

Consider the inference of decoding a single simplex signal into a basecall. I can find the most likely basecall that explains the observed signal. I can do this for the second stand signal too. Those inference problems are independent (at least they are treated as such -- they are not informed by each other).

A duplex caller however is attempting to find a single basecall that explains both observed signals simultaneously. Although clearly in the limit of complete information all three calls would be identical, when variation is taken into account the calls can differ.

Apr 27 '23 20:04 cjw85

Thank you @cjw85 for your insights. Unfortunately, I do not have 'ground truth' data for these sequences so I am also not sure what the actual bases should be.

May 02 '23 19:05 dpaudel-tb