mob-suite
mob-suite copied to clipboard
Output mash tree or equivalent?
Hi,
It's nice that mob-suite tells me the plasmid in the reference database which is closest to each plasmid it identifies in my sample, but it would be good if it output a tree of mash distances so I can see how it relates to multiple plasmids in the database. Possibly within some mash distance threshold so that I don't end up with a tree with 12000 tips.
Or even just output the mash distance matrix of my plasmid vs everything in teh database, so I can easily see how far it is from another plasmid of interest.
I can roll my own using mashtree, but others might find useful?
Just a thought, thanks for the nice tool.
Best,
Phil
I will label this one as an enhancement for future versions. The clusters.txt file in the databases/ directory contains the typing information for all of the plasmids in the reference database. We have a primary cluster designation that is meant for aggregating similar plasmids together at a mash distance of 0.06 and a secondary cluster designation distance (0.025) which should capture near duplicates of sequences. You can select members of the same cluster in the file for building a tree with mash to see larger patterns. In our experience, draft versus complete versions of plasmids can vary up to 0.025 in mash distances, so if a plasmid shares that same cluster, you will want to use a more sensitive technique like SNP typing for distinguishing them further.
Dear Phil, although not exactly what you need in terms of distance matrix to all database entries, but you can try out our previous version (2.1.0) of MOB-Suite with plasmid host-range phylogenetic tree reconstruction feature. It will build a phylo tree based on plasmid features (replicon and cluster id) and overlay it against all plasmid sequences and corresponding taxonomy information in our database.
Thank you for feature suggestion.
$mob_typer -i plasmid.fasta -o mob-typer --host_range_detailed
Thanks both!
I am thinking to create a series of single-linkage flat clusters based on mash distances, between your input and the reference database and provide basic summary statistics on the average pairwise distance within the primary mob_cluster. This will constrain the number of samples and make the comparisons sensible.