dorado Differing read count number in simplex and duplex basecalling

Hello, I ran simplex and duplex basecalling on the same dataset (dorado-0.4.1-linux-x64 with [email protected]). I was expecting to get same number of reads on the simplex basecalling and the duplex basecalling filtered with ( dx:i:0; dx:i:-1). However there seems to be some discrepancy on the reported read counts. I was wondering if this was expected and which simplex reads should be trusted (direct simplex basecalling or simplex filtered after duplex basecalling)? Thanks

File	ReadCount	Tags included
simplex.bam	11,130,442
duplex.bam	13,130,264
filtered_duplex_only.bam	1,966,966	dx:i:1
filtered_simplex_only.bam	11,163,298	dx:i:0; dx:i:-1
filtered_simplex_NoDuplex_i0.bam	7,980,822	dx:i:0
filtered_simplex_WithDuplex_i-1.bam	3,182,476	dx:i:-1

Nov 15 '23 18:11 dpaudel-tb

Hi @dpaudel-tb - we have slightly different read splitting configurations for simplex vs duplex basecalling. This can lead to a different number of reads being split in each case. That's most likely the root cause of this count discrepancy. So I would suggest you go with the dx:0 + dx:-1 simplex reads from the duplex run.

We'll look at harmonizing the options between the 2 cases.

Nov 16 '23 13:11 tijyojwad

Thank you @tijyojwad!

Nov 16 '23 14:11 dpaudel-tb