spades icon indicating copy to clipboard operation
spades copied to clipboard

Recover rDNA from metagenomes

Open SilasK opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. For generic questions use Q&A section in the Discussions forum above.

rRNA and tRNA are thought to be difficult to be assembled from metagenomes, due to their similarity. I found that metal spades do already a good job in assembling some of them. However, I'm not sure about their quality

Describe the solution you'd like

Ideally, I would like to search for rRNA (and tRNA) during the assembly of metagenomes and then pay special attention to their assembly (using normal spades)

Describe alternatives you've considered

I saw phyloFlash which does more or less this. But I think it would be ideal to have it inside the spades workflow.

Additional context

ABSTRACT The small-subunit rRNA (SSU rRNA) gene is the key marker in molecular ecology for all domains of life, but it is largely absent from metagenome-assembled genomes that often are the only resource available for environmental microbes. Here, we present phyloFlash, a pipeline to overcome this gap with rapid, SSU rRNA- centered taxonomic classification, targeted assembly, and graph-based binning of full metagenomic assemblies. We show that a cleanup of artifacts is pivotal even with a curated reference database. With such a filtered database, the general- purpose mapper BBmap extracts SSU rRNA reads five times faster than the rRNA- specialized tool SortMeRNA with similar sensitivity and higher selectivity on simu- lated metagenomes. Reference-based targeted assemblers yielded either highly fragmented assemblies or high levels of chimerism, so we employ the general- purpose genomic assembler SPAdes. Our optimized implementation is independent of reference database composition and has satisfactory levels of chimera formation. phyloFlash quickly processes Illumina (meta)genomic data, is straightforward to use, even as part of high-throughput quality control, and has user-friendly output re- ports. The software is available at https://github.com/HRGV/phyloFlash (GPL3 license) and is documented with an online manual. IMPORTANCE

SilasK avatar Jul 01 '21 08:07 SilasK

You are correct that it is often not possible to obtain full-length 16S assemblies from metagenomes. There are many reasons for this, however, all of them can be reduced to the single word: "repeats". Ribosomal genes exist in multiple copies in the genome, making assembly difficult. This difficulty is extremely amplified by the presence of close species / strains in the metagenome as the repeats become interspecies. Certainly MAGs will miss rRNA contigs as binning is only done for contigs longer than 1.5 - 2 kbp and the rRNA contigs will certainly be shorter. Proper binning refining procedures could help here.

There is no safe and reliable way to assemble complete rRNA genes due to fundamental limitations of short reads (e.g. fragment length that does not span the repeat in question, etc). I'm afraid many external "pipelines" are prone to produce chimeric results. That said, I'm not sure what could be the "special attention" here :)

asl avatar Jul 14 '21 16:07 asl

If I metal spades can assemble 16SrNA which then are binned into high-quality bins, would you trust these sequences? Obviously you could do som checks if it's only a contig with 16S gene and if the Taxonomy mages or so. But in the case where you have a rNA gene inside a longer scaffold?

I got a reviewer, tat sayd you can't trust them.

SilasK avatar Aug 09 '21 19:08 SilasK