dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

Different results from different 'minoverlap' parameter

Open otaviolovison opened this issue 2 years ago • 1 comments

Hello!

I was analysing 16S NGS data for a study that will correlate culture x 16S NGS results: On the first analysis, I kept the 'minoverlap' default and we had around 1100 ASVs and didn't find some pathogens previosly isolated in culture - mainly Acinetobacter and Burkholderia-Caballeronia-Paraburkholderia. So I set 'minoverlap = 10', had around 1500 ASVs and finally Acinetobacter and Burkholderia-Caballeronia-Paraburkholderia was found in the microbiome data. I would really like to understand these differences.

Thanks in advance!

otaviolovison avatar Apr 26 '22 14:04 otaviolovison

The minOverlap parameter defines the minimum amount the forward and reverse reads must overlap with one another for them to be merged. So, if the reads are too short to overlap sufficiently, they will be dropped by this step.

A couple key things to remember here: It is the length of the reads after truncLen has been enforced that affects merging. It is the length of the sequenced amplicon including primers if they are sequenced. And there is biological length variation in even 16S fragment length, the V3V4 locus being a notable example with two modes about 20nts different in length from one another.

A rough rule of thumb, your truncation lengths should add up to 20 + the length of the sequenced amplicon in the longest organism being targeted.

benjjneb avatar Apr 27 '22 17:04 benjjneb