DeepMAsED icon indicating copy to clipboard operation
DeepMAsED copied to clipboard

Bowtie2 (bad greedy) and read multimapping for metagenomes

Open TealFurnholm opened this issue 4 years ago • 1 comments

Since this is designed for a meta-NGS data set - and Bowtie2 is not (he says so in his manual).

  • BT2 is a greedy matcher = very low %ID matches will still be reported, it was designed for a single eukaryote genome read alignment, with splicing and SNPs and optimized to find the first best hit
  • BT2 is incomprehensible in its manual to try and adjust to something similar to a %ID
  • 75% of all bacterial genes are orthologs - I curated the entire NCBI+JGI's 529 million genes, I know - and metagenomes are replete with many strains from the same species == you have to multimap the reads.

Instead of Bowtie2, I ran BBmap with 95% ID either with or without multimapping using MEC (since I still haven't gotten DeepMased to work: see other reported issue)

  • no multimapping (random assign read to one of the best hits): #split_num 741
  • with read multimapping: #split_num 5322

You can see there is quite a difference - and I think you'll find the same with DeepMased. Orthology/multimapping is a major issue. You may find quite a bit more than 1% chimeras!
Please trust me and check it out.

I plan to check results with MetaQuast to see which is correct, once I get DeepMAsED working.

The REAL question is what will your software do if I feed it a bam file with multimapped read?

Best, Teal

TealFurnholm avatar Nov 12 '20 16:11 TealFurnholm