Roary icon indicating copy to clipboard operation
Roary copied to clipboard

Roary for metagenomes

Open YiJessePi opened this issue 5 years ago • 2 comments

I've read in the documentation that "Roary is not intended for meta-genomics or for comparing extremely diverse sets of genomes". Can you please explain why? What are the drawbacks of using it on gene calling from metagenomic data?

YiJessePi avatar Oct 27 '19 20:10 YiJessePi

Roary is fast because it expects lots of very similar proteins and uses cd-hit to speed that part up. After that it falls back to ALL vs ALL blastp. Metagenomes have lots of genes, let's say you have N. Then roary will take N x N time to run. It will never finish. Consider other tools like proteinortho, MMseqs2, cd-hit directly.

tseemann avatar Oct 29 '19 05:10 tseemann

Thanks Torsten! So is it just a matter of time? I've actually planned to execute Roary on reconstructed bins of the same species (is there any meaning for pangenome analysis for different species?) which I assume will have similar number of genes as an isolate genome.

YiJessePi avatar Oct 29 '19 08:10 YiJessePi