dorado icon indicating copy to clipboard operation
dorado copied to clipboard

Very few duplex reads with short amplicons

Open HenkvdMeulen opened this issue 1 year ago • 16 comments

Somehow I'm ending up with a very low number of duplex reads after using Dorado duplex basecalling with the following command (used the ligation sequencing kit V14 SQK-LSK114):

dorado duplex [email protected] pod5/directory/ > duplex.bam

I extracted the duplex reads from this file using samtools view duplexBarcoded/barcode01.bam | grep "dx:i:1"| samtools view -O BAM -o duplexOnly.bam

However, only 615 out of nearly 2M reads turn out to be duplex reads. What could be the issue here? Does anyone have suggestions as to what might be going wrong and how I could solve it? Thanks in advance!

HenkvdMeulen avatar Jun 28 '23 09:06 HenkvdMeulen

Hi @HenkvdMeulen

A couple of questions:

  1. What flowcells are you using? Are these high duplex flowcells?
  2. How long are your reads? What is their N50?

vellamike avatar Jun 28 '23 09:06 vellamike

Hi @vellamike

  1. I'm using the R10.4.1 flow cell. is there any difference among those, I would assume not?
  2. The amplicon length is expected to be around 450 bp with primers, tails and barcodes. For some reason the MinION did not produce a run report that I can check now, However, there was a very tight peak in the read length around the expected amplicon size. hope that is sufficiently helpful, sorry! I tried to retrieve the report from MinKnow but it doesn't show the experiment anymore, even though I am logged in and changed the number of past days to the right amount.

HenkvdMeulen avatar Jun 28 '23 13:06 HenkvdMeulen

Hi @HenkvdMeulen,

Dorado has a known issue whereby duplex basecalling of short reads (<1kb) results in low yields. This is an result of the duplex algorithm which has been optimised for longer reads. We are working on a fix and will update this issue when we release it, but at present what you are trying to do will unfortunately not work very well.

Mike

vellamike avatar Jun 28 '23 13:06 vellamike

Hi @vellamike, That's good to know, thanks! I will patiently wait for the update to arrive. Do you have any estimation on the time until we could expect this fix?

Best regards, Henk

HenkvdMeulen avatar Jun 28 '23 14:06 HenkvdMeulen

Hi @vellamike, is there any progress yet in regard to this issue? Do you know when this fix will be available in an update? Thanks!

Best regards, Henk

HenkvdMeulen avatar Aug 06 '23 14:08 HenkvdMeulen

Hi @vellamike

I'm also trying to perform duplex basecalling on reads < 1kb. It is tagging most reads with dx:i:0 (simplex without duplex offpring) and some reads with dx:i:-1 (simplex reads which have duplex offsprings), but no reads actually get tagged with dx:i:1 (duplex). Shouldn't it produce some duplex reads tagged with dx:i:1 if there are dx:i:-1 simplex reads present?

Thanks

quintonwessells avatar Aug 29 '23 17:08 quintonwessells

Hi @quintonwessells which version of dorado are you using? your issue sounds like a bug we fixed in 0.3.4

we haven't made any explicit improvements for short read duplex yet. we made some updates to long read duplex yield in the most recent release, and we are in the process of discussing short read related changes.

tijyojwad avatar Aug 29 '23 23:08 tijyojwad

Hi @tijyojwad, I was using 0.3.3. Just tried it with 0.3.4 and now I'm not getting any dx:i:-1 or dx:i:1 reads, so seems that is sort of fixed even though I can tell some of the reads marked as simplex are actually duplex. Thanks for keeping us posted on status of short read related changes!

quintonwessells avatar Aug 30 '23 05:08 quintonwessells

even though I can tell some of the reads marked as simplex are actually duplex

what do you mean by this?

tijyojwad avatar Sep 05 '23 11:09 tijyojwad

I think I might be running into the same length limitation. I was playing around with forcing dorado to duplex call reads I knew were complementary to each other by passing the --pairs flag, but it appears that the reads are filtered. Below is a partial output of my experimental run. Is dorado filtering by length or is there another criteria I'm missing that it's using to reject this pair?

[2023-09-07 11:26:51.844] [info] > Starting Stereo Duplex pipeline [2023-09-07 11:26:51.853] [info] > Reading read channel info [2023-09-07 11:26:51.854] [info] > Processed read channel info [2023-09-07 11:26:54.204] [debug] hits 1, mapq 60, overlap length 316, overlap frac 0.8885542, delta -1103, read 1 376, read 2 332, strand -, pass true, accepted true, temp start 27 temp end 343, comp start 18 comp end 313, 7239544e-c176-4d8b-93d4-8cc9da1bac8f and f2f76b5d-0c16-453b-9720-2efd1ca01273 [2023-09-07 11:26:54.964] [info] > Duplex reads basecalled: 0 [2023-09-07 11:26:54.964] [info] > Duplex reads filtered: 1 [2023-09-07 11:26:54.964] [info] > Duplex rate: 0.56497175% [2023-09-07 11:26:54.964] [info] > Basecalled @ Bases/s: 3.552131e+01

alexanderwg-ornl avatar Sep 07 '23 15:09 alexanderwg-ornl

Unless the min-qscore option was specified in the cmdline, the only other filter enabled by default is the min sequence length, which is 5. so even though the pairing is accepted and a duplex read is called, the length could be super small. probably because the model hasn't really seen data of that size so it doesn't know what to do with it

tijyojwad avatar Sep 07 '23 15:09 tijyojwad

Unless the min-qscore option was specified in the cmdline, the only other filter enabled by default is the min sequence length, which is 5. so even though the pairing is accepted and a duplex read is called, the length could be super small. probably because the model hasn't really seen data of that size so it doesn't know what to do with it

Thanks for the reply! According to dorado's help file for duplex, the min-qscore default is 0 so I never altered it; does that mean the filter is off or is it behaving in some other way? I also don't see a minimum sequence length flag in the help documentation, is this an actual option I can set?

alexanderwg-ornl avatar Sep 07 '23 15:09 alexanderwg-ornl

does that mean the filter is off or is it behaving in some other way?

correct, min qscore 0 means no filtering is happened based on the read's mean qscore

I also don't see a minimum sequence length flag in the help documentation, is this an actual option I can set?

unfortunately not. but that minimum length is 5, which we thought was a good threshold and anything smaller than that is most likely garbage anyway. from your log I'd expect the duplex read to be ~200 bp or longer...

tijyojwad avatar Sep 07 '23 15:09 tijyojwad

I see, so it sounds like it is the same problem as the OP: the duplex basecaller is rejecting these shorter alignments. Thanks for your insight!

alexanderwg-ornl avatar Sep 07 '23 16:09 alexanderwg-ornl

@vellamike Do you have any info on when duplex pairing for short amplicons might be ready? Thanks!

Delayed-Gitification avatar Oct 27 '23 14:10 Delayed-Gitification

Apologies for the delay in replying. Duplex pairing for short amplicons is not something which we will deliver soon. It is currently considered a research problem.

vellamike avatar Dec 04 '23 20:12 vellamike