DeepMAsED
DeepMAsED copied to clipboard
Bowtie2 (bad greedy) and read multimapping for metagenomes
Since this is designed for a meta-NGS data set - and Bowtie2 is not (he says so in his manual).
- BT2 is a greedy matcher = very low %ID matches will still be reported, it was designed for a single eukaryote genome read alignment, with splicing and SNPs and optimized to find the first best hit
- BT2 is incomprehensible in its manual to try and adjust to something similar to a %ID
- 75% of all bacterial genes are orthologs - I curated the entire NCBI+JGI's 529 million genes, I know - and metagenomes are replete with many strains from the same species == you have to multimap the reads.
Instead of Bowtie2, I ran BBmap with 95% ID either with or without multimapping using MEC (since I still haven't gotten DeepMased to work: see other reported issue)
- no multimapping (random assign read to one of the best hits): #split_num 741
- with read multimapping: #split_num 5322
You can see there is quite a difference - and I think you'll find the same with DeepMased.
Orthology/multimapping is a major issue. You may find quite a bit more than 1% chimeras!
Please trust me and check it out.
I plan to check results with MetaQuast to see which is correct, once I get DeepMAsED working.
The REAL question is what will your software do if I feed it a bam file with multimapped read?
Best, Teal