drep icon indicating copy to clipboard operation
drep copied to clipboard

drep to consider rRNA genes?

Open Kirk3gaard opened this issue 3 years ago • 2 comments

Hi

Thanks for a great tool.

Have you considered a scoring scheme taking the presence of e.g. rRNA genes into account to prioritize bins meeting more of the MiMAG requirements? (https://www.nature.com/articles/nbt.3893/tables/1)

I just ran a mix of short read and long read bins through drep and was surprised that some of the short read bins got a higher score than the matching long read bins. Turns out that the much improved N50 came with a slight increase in the contamination levels (likely overextension) so it can be fixed by changing the weights as short read bins tend to have artificially low contamination scores. However, I think that scoring the presence of rRNA+tRNA genes could be a nice add on to the current model.

Best regards Rasmus

Kirk3gaard avatar Apr 07 '21 08:04 Kirk3gaard

Hi, I have seen the same issue with long-read bins scored lower than short-read-only bins. When adjusting the scores, would you lower the contamination weight or increase the N50 weight? If you have some suggested values that would be great! Camilla (I agree that it would be good to score presence of rRNA+tRNA genes - if they are in a long contig or are found to match the genome.)

camillaln avatar Apr 07 '21 15:04 camillaln

Hi Rasmus and Camilla,

Thanks for the feedback. I haven't used many long-read bins in my own research so I wasn't aware of this issue, and thanks for bringing it up.

It's a good point about including rRNA / tRNA genes in the scoring algorithm in accordance with MiMAG- I'll look into it for the next dRep version. In the meantime, you could replicate this functionality using the --extra_weight_table option to add to scores based on rRNA/tRNA genes identified using external programs.

Best, Matt

MrOlm avatar Apr 07 '21 17:04 MrOlm