Mash
Mash copied to clipboard
Mash screen winner-take-all and multiple best number of hash hits
Hello, I have noticed that a discrepancy between the results of using mash screen with and without the winner-take-all (wta) option. For context I am screening a number of genomes (more specifically their constituent contigs) to see if they contain plasmids (which have been sketched -s 1000 -k 21).
I have found that if a contig has a multiple hits to a plasmid and the best plasmid hits have the same number of hash hits in a non-wta screen, this result will not appear in the wta screen - It appears the wta cannot choose between the many best hits and therefore picks none of them.
I have yet to try it but I think larger sketches and kmer values could circumvent this problem - having said this plasmids I am using are from the PLSDB, a very extensive catalogue of plasmids which does contain distinct but very similar plasmids and hence a bottom sketch method may still produce this problem.
I hope I have worded this clearly, please let me know if you need more detail.
Kind regards,Adam P.S. I just graduated from a masters from Bioinf a few months ago, where I assumed BLAST was king of the heuristics. It has been really interesting to work with mash. Thanks
Hi Adam, I think I understand the problem but I am not able to reproduce it. Is it possible that these plasmids are disappearing with WTA because all their hashes are assigned to another, better-scoring plasmid? A way to test that would be to isolate the ties in their own sketch, then start adding other ones. I can look into this more as well if you are able to share your data (my email is here).