Chimera & sequence error removal
Hi Simone,
I've run the full pipeline and as I dig into my results I've begun to think more about sequence errors, chimeras, and contamination identification/removal.
Until now, I have only worked with Illumina data & ASV methods where I have dealt with chimeras and contamination using e.g. Deblur and Decontam. I am not so familiar with Nanopore data and OTU clustering. My questions are:
- Does the MetONTIIME pipeline address chimeras through the clustering process? Is it worthwhile running my table through something like qiime vsearch uchime-ref or uchime-denovo?
- Is there anything equivalent to deblur or DADA2 for denoising Nanopore data, or is this again addressed through clustering?
- Are there any tools available to identify potential contamination in Nanopore data? I realise this is beyond the scope of MetONTIIME but I can't find much discussion of this on the Nanopore community forum so thought I might as well ask here...
Thanks!
Dear isa,
- Does the MetONTIIME pipeline address chimeras through the clustering process? Is it worthwhile running my table through something like qiime vsearch uchime-ref or uchime-denovo?
In my understanding, chimeras are PCR products that derive from PCR artefacts, where the amplicon is composed by the gene of interest of two different molecules (i.e., frequently, different species). With short read sequencing you want to be sure that both extremities of the ASV derive from the same species before feeding ASVs into your favourite classifier, which may be k-mer based. With alignment-based classification methods (such as Blast and VSEARCH, implemented in MetONTIIME) we can control for the presence of chimeras by adjusting the minQueryCoverage parameter (in the range [0-1]), i.e. the fraction of each read that should align to its best match in the database to be considered as a valid alignment. If this value is high enough (it is set to 0.8 by default), you can be confident that the read is not derived from a chimera, which may result in a partial alignment. Also in case the chimera is between different but similar species, you may be able to recognise it by setting a high enough minIdentity value.
- Is there anything equivalent to deblur or DADA2 for denoising Nanopore data, or is this again addressed through clustering?
Not that I am aware of. ONT recently implemented reads correction by HERRO into their own Dorado suite, but it only works with reads longer than 10 kb. I remember Canu assembler also has a reads correction routine, but I am not sure it would be a good idea to attempt error correction with amplicon data, as biological differences among strains may be small. As an alternative, outside of QIIME2 environment, you may test Emu and NGSpeciesID (the latter calls consensus sequence for each cluster, but I'm not sure it is sensitive enough to distinguish between similar species). Actually, with MetONTIIME default parameters we are not performing any clustering, the command is mainly run to produce the intended QIIME2 artefacts (feature tables with counts), but you may consider performing clustering by setting a lower value for clusteringIdentity parameter (this is something I have not tested thoroughly yet).
- Are there any tools available to identify potential contamination in Nanopore data? I realise this is beyond the scope of MetONTIIME but I can't find much discussion of this on the Nanopore community forum so thought I might as well ask here...
Personally, I usually run Blast or Kraken2 versus a database also including other genes and taxa. You can find some wrapper scripts for Blast in MetaBlast repository or in MetaKraken2 repository.
I hope this helps! Let me know your opinion on this. Best, SM
Hi, closing due to inactivity. Feel free to reopen it, in case you have any further questions. Best, SM