dada2
dada2 copied to clipboard
Demultiplexing step between "lima" Pacbio tool and "removePrimers()" function
Hi Ben
I have a pool of 24 samples (16S Pacbio) CCS reads. I wonder why I have such a difference of reads after the demultiplexing step between lima
from smrttools and removePrimers()
function from dada2.
There are the number of CCS reads in input (read.in
), the filtered CCS after removePrimers()
and after lima
:
reads.in rmPrim() lima
demux.bc1005--bc1033.fastq.gz 90496 82602 90455
demux.bc1005--bc1035.fastq.gz 78402 71094 78400
demux.bc1005--bc1044.fastq.gz 82046 74074 82042
demux.bc1005--bc1045.fastq.gz 83447 75950 83445
demux.bc1007--bc1033.fastq.gz 94799 87095 94776
demux.bc1007--bc1035.fastq.gz 61092 56384 61089
demux.bc1007--bc1044.fastq.gz 83899 76704 83896
demux.bc1007--bc1045.fastq.gz 70029 64247 70026
demux.bc1008--bc1033.fastq.gz 84304 77480 84295
demux.bc1008--bc1035.fastq.gz 80398 74317 80391
demux.bc1008--bc1044.fastq.gz 79725 72993 79721
demux.bc1008--bc1045.fastq.gz 95786 88758 95783
demux.bc1012--bc1033.fastq.gz 81354 75160 81338
demux.bc1012--bc1035.fastq.gz 66858 61853 66846
demux.bc1012--bc1044.fastq.gz 16823 14374 16822
demux.bc1012--bc1045.fastq.gz 17367 15300 17367
demux.bc1015--bc1033.fastq.gz 15534 12287 15534
demux.bc1015--bc1035.fastq.gz 15805 13202 15804
demux.bc1015--bc1044.fastq.gz 14849 11871 14848
demux.bc1015--bc1045.fastq.gz 16520 12640 16520
demux.bc1020--bc1033.fastq.gz 18154 15402 18154
demux.bc1020--bc1035.fastq.gz 13176 11063 13176
demux.bc1020--bc1044.fastq.gz 15992 13742 15991
demux.bc1020--bc1045.fastq.gz 15607 12940 15606
Both tools work well, without any error. I use the common 16S primers :
fwd <- "AGRGTTYGATYMTGGCTCAG"
rev <- "RGYTACCTTGTTACGACTT"
Also, if I stay with the removePrimers()
function , I read it also orients the reads. Does that mean it is not necessary to start the pipeline with the forward and reverse CCS reads? I can run dada2
with a pool of mixed fwd and rev reads? This could save information.
Best, Vincent
lima performs demultiplexing of the reads into the samples they came from using barcodes attached to the sequences. It also trims the barcodes.
removePrimers
then filters the read set down to those reads in which both the forward and reverse primers were detected, trims the read to the section between the forward and reverse primers, and (by default) orients the reads to all being in the forward-->reverse orientation.
So removePrimers
should be run after lima. The level of read loss you are seeing with removePrimers
is consistent with what we often observe. You can poke at the options of removePrimers
(see the help ?removePrimers
) and make the primer matching parameters more lenient if you want to try to keep slightly more reads at that step.
Does that mean it is not necessary to start the pipeline with the forward and reverse CCS reads? I can run dada2 with a pool of mixed fwd and rev reads?
Before removePrimers
the CCS reads are in mixed orientation. You could run DADA2 on that, but it isn't recommended as you'll end up with duplicates of every ASV at the end, and slightly reduced sensitivity to rare variants as well. So, removePrimers(..., orient=TRUE)
is the default and recommended way to run the command.
Thanks for your quick and complete feedback! I know lima
first trims the barcodes, and can trim the primers in a second run with the primer sequences (https://lima.how/faq/primer.html)
I guess lima
and removePrimers
do exactly the same thing when they trim the primers. That's why I was wondering there was a such difference between both tools.