dorado icon indicating copy to clipboard operation
dorado copied to clipboard

Simplex read id in multiple duplex pairs

Open davidebolo1993 opened this issue 9 months ago • 21 comments

Hi guys,

we are investigating our first duplex run. I've read useful discussions in #316 and #327 but couldn't find an obvious explanation to what we see. From the docs and issues we see that a simplex read (let's call this r) that are also part of a duplex pair (let's call it d) is tagged dx:i:-1 and the corresponding duplex pair (d, indeed) is in the form r,t (r and t are the read names) and is tagged as dx:i:1. An example is this read here (0ee988dd-2227-47f7-ab19-99acfc66d686), with the corresponding tags.

d69f94b2-51d2-4c61-8c3b-7104c6cccc2a;0ee988dd-2227-47f7-ab19-99acfc66d686	1
0ee988dd-2227-47f7-ab19-99acfc66d686	-1

So far so good, and indeed most of the simplex reads having a duplex pair follow this scheme.

There are, however, simplex reads (dx:i:-1) that have multiple duplex pairs, so that the read r appears in a first duplex r,t and in a second duplex q,r. An example is this read here(d69f94b2-51d2-4c61-8c3b-7104c6cccc2a):

d69f94b2-51d2-4c61-8c3b-7104c6cccc2a;0ee988dd-2227-47f7-ab19-99acfc66d686	1
ed2df147-bb5c-4215-98e6-69b7ed90b01c;d69f94b2-51d2-4c61-8c3b-7104c6cccc2a	1
d69f94b2-51d2-4c61-8c3b-7104c6cccc2a	-1

What is happening here ? Are the other 2 ids basically referring to the same template read d69f94b2-51d2-4c61-8c3b-7104c6cccc2a but are partial duplex of 2 different part of it ? Something like this (https://github.com/nanoporetech/dorado/issues/327#issuecomment-1691714958) but at different ends? I'm just guessing as I couldn't find anything related to this - sorry if I missed it.

Thanks,

Davide

davidebolo1993 avatar Oct 02 '23 11:10 davidebolo1993