dorado icon indicating copy to clipboard operation
dorado copied to clipboard

duplex and barcoding

Open jts opened this issue 1 year ago • 12 comments

Hi,

Will the current version of dorado take barcodes into account when identifying duplex pairs, or does this need to be done separately and provided with the --pairs option to dorado duplex?

Thanks

jts avatar Jan 26 '24 14:01 jts

No, barcodes are not taken into account in pairing. Although I don't see that this would matter, what's the specific problem you're trying to solve?

vellamike avatar Jan 26 '24 15:01 vellamike

I'm working on a specific application where the input library is low-ish complexity (not as bad as an amplicon, but not as good as WGS) so worried about the rate of false pairing. I'd like to barcode many such samples together and use the barcodes to reduce the chance of false pairs being called as duplex reads.

jts avatar Jan 26 '24 16:01 jts

I see what you mean. In your case your intuition is correct and you should produce a pairs file and use that, you'll need to write your own script to produce it.

vellamike avatar Jan 26 '24 16:01 vellamike

@jts you can also consider the following -

  1. run your dataset through simplex basecalling with barcoding enabled dorado basecaller <model> <pod5> --kit-name <barcode-kit> | dorado demux --no-classify --output-dir classify and split the dataset
  2. then fetch the read ids per barcode from the corresponding .bam and put it in a read.txt file
  3. run dorado duplex <model> <pod5> --read-ids reads.txt and this will run duplex basecalling only with the read ids from that barcode

tijyojwad avatar Jan 29 '24 17:01 tijyojwad

@jts if you want to generate a pairs file yourself here's how I did it: https://github.com/nanoporetech/dorado/issues/368#issuecomment-1900743490

shenker avatar Jan 29 '24 17:01 shenker

Great, thanks @tijyojwad and @shenker

jts avatar Jan 29 '24 17:01 jts

Hi,

I don't see why I need to do basecalling twice. Can I first do dorado duplex > bam and then dorado demux to demultiplex the bam file into many barcodes folders?

Thanks.

lagphase avatar Feb 13 '24 23:02 lagphase

HI @lagphase - that will work for the simplex reads, but will most likely result in all duplex reads getting unclassified since the pairing/duplex algorithm will strip the barcode information

tijyojwad avatar Feb 13 '24 23:02 tijyojwad

Hi @tijyojwad, thanks for your quick response. Then would you recommend I use --no-trim when doing simplex basecalling?

lagphase avatar Feb 14 '24 00:02 lagphase

@lagphase depends on what you're trying to do -

  1. dorado duplex is not setup to do any adapter/primer trimming or barcode classification yet. So any reads generated through dorado duplex are effectively run with --no-trim.
  2. If you are running dorado basecaller and want to barcode post basecalling, please run with --no-trim.

However, keeping the barcodes untrimmed in the simplex reads will still result in duplex reads not having the barcodes just by virtue of how we find overlapping parts of the duplex read. The barcodes may make it past the overlapping stage, but likely not. So a more robust approach until we add duplex barcoding to dorado would be to run what's described here - https://github.com/nanoporetech/dorado/issues/600#issuecomment-1915188395

tijyojwad avatar Feb 14 '24 02:02 tijyojwad

@tijyojwad that helps! thank you.

lagphase avatar Feb 14 '24 21:02 lagphase

@jts you can also consider the following -

  1. run your dataset through simplex basecalling with barcoding enabled dorado basecaller <model> <pod5> --kit-name <barcode-kit> | dorado demux --no-classify --output-dir classify and split the dataset
  2. then fetch the read ids per barcode from the corresponding .bam and put it in a read.txt file
  3. run dorado duplex <model> <pod5> --read-ids reads.txt and this will run duplex basecalling only with the read ids from that barcode

Just to clarify, when carrying out step 1 with dorado base caller should the --no-trim option be added? As you haven't written it in the code, but on the GitHub page it recommends using --no-trim if you want to demultiplex later so I'm a bit confused on the correct way to proceed

luckybillion avatar Jul 23 '24 20:07 luckybillion