dada2 Demultiplexing step between "lima" Pacbio tool and "removePrimers()" function

Demultiplexing step between "lima" Pacbio tool and "removePrimers()" function

Open pailloufat-stack opened this issue 1 year ago • 2 comments

Hi Ben

I have a pool of 24 samples (16S Pacbio) CCS reads. I wonder why I have such a difference of reads after the demultiplexing step between lima from smrttools and removePrimers() function from dada2.

There are the number of CCS reads in input (read.in), the filtered CCS after removePrimers() and after lima :

                                                      reads.in  rmPrim()     lima
demux.bc1005--bc1033.fastq.gz    90496     82602        90455
demux.bc1005--bc1035.fastq.gz    78402     71094        78400
demux.bc1005--bc1044.fastq.gz    82046     74074        82042
demux.bc1005--bc1045.fastq.gz    83447     75950        83445
demux.bc1007--bc1033.fastq.gz    94799     87095        94776
demux.bc1007--bc1035.fastq.gz    61092     56384        61089
demux.bc1007--bc1044.fastq.gz    83899     76704        83896
demux.bc1007--bc1045.fastq.gz    70029     64247        70026
demux.bc1008--bc1033.fastq.gz    84304     77480        84295
demux.bc1008--bc1035.fastq.gz    80398     74317        80391
demux.bc1008--bc1044.fastq.gz    79725     72993        79721
demux.bc1008--bc1045.fastq.gz    95786     88758        95783
demux.bc1012--bc1033.fastq.gz    81354     75160        81338
demux.bc1012--bc1035.fastq.gz    66858     61853        66846
demux.bc1012--bc1044.fastq.gz    16823     14374        16822
demux.bc1012--bc1045.fastq.gz    17367     15300        17367
demux.bc1015--bc1033.fastq.gz    15534     12287        15534
demux.bc1015--bc1035.fastq.gz    15805     13202        15804
demux.bc1015--bc1044.fastq.gz    14849     11871        14848
demux.bc1015--bc1045.fastq.gz    16520     12640        16520
demux.bc1020--bc1033.fastq.gz    18154     15402        18154
demux.bc1020--bc1035.fastq.gz    13176     11063        13176
demux.bc1020--bc1044.fastq.gz    15992     13742        15991
demux.bc1020--bc1045.fastq.gz    15607     12940        15606

Both tools work well, without any error. I use the common 16S primers :

fwd <- "AGRGTTYGATYMTGGCTCAG" 
rev <- "RGYTACCTTGTTACGACTT"

Also, if I stay with the removePrimers() function , I read it also orients the reads. Does that mean it is not necessary to start the pipeline with the forward and reverse CCS reads? I can run dada2 with a pool of mixed fwd and rev reads? This could save information.

Best, Vincent

Oct 13 '22 08:10 pailloufat-stack

lima performs demultiplexing of the reads into the samples they came from using barcodes attached to the sequences. It also trims the barcodes.

removePrimers then filters the read set down to those reads in which both the forward and reverse primers were detected, trims the read to the section between the forward and reverse primers, and (by default) orients the reads to all being in the forward-->reverse orientation.

So removePrimers should be run after lima. The level of read loss you are seeing with removePrimers is consistent with what we often observe. You can poke at the options of removePrimers (see the help ?removePrimers) and make the primer matching parameters more lenient if you want to try to keep slightly more reads at that step.

Does that mean it is not necessary to start the pipeline with the forward and reverse CCS reads? I can run dada2 with a pool of mixed fwd and rev reads?

Before removePrimers the CCS reads are in mixed orientation. You could run DADA2 on that, but it isn't recommended as you'll end up with duplicates of every ASV at the end, and slightly reduced sensitivity to rare variants as well. So, removePrimers(..., orient=TRUE) is the default and recommended way to run the command.

Oct 13 '22 15:10 benjjneb

Thanks for your quick and complete feedback! I know lima first trims the barcodes, and can trim the primers in a second run with the primer sequences (https://lima.how/faq/primer.html)

I guess lima and removePrimers do exactly the same thing when they trim the primers. That's why I was wondering there was a such difference between both tools.

Oct 13 '22 15:10 pailloufat-stack

dada2 dada2 copied to clipboard

Demultiplexing step between "lima" Pacbio tool and "removePrimers()" function

dada2
dada2 copied to clipboard