mob-suite icon indicating copy to clipboard operation
mob-suite copied to clipboard

Output mash tree or equivalent?

Open flashton2003 opened this issue 4 years ago • 4 comments

Hi,

It's nice that mob-suite tells me the plasmid in the reference database which is closest to each plasmid it identifies in my sample, but it would be good if it output a tree of mash distances so I can see how it relates to multiple plasmids in the database. Possibly within some mash distance threshold so that I don't end up with a tree with 12000 tips.

Or even just output the mash distance matrix of my plasmid vs everything in teh database, so I can easily see how far it is from another plasmid of interest.

I can roll my own using mashtree, but others might find useful?

Just a thought, thanks for the nice tool.

Best,

Phil

flashton2003 avatar May 12 '20 08:05 flashton2003

I will label this one as an enhancement for future versions. The clusters.txt file in the databases/ directory contains the typing information for all of the plasmids in the reference database. We have a primary cluster designation that is meant for aggregating similar plasmids together at a mash distance of 0.06 and a secondary cluster designation distance (0.025) which should capture near duplicates of sequences. You can select members of the same cluster in the file for building a tree with mash to see larger patterns. In our experience, draft versus complete versions of plasmids can vary up to 0.025 in mash distances, so if a plasmid shares that same cluster, you will want to use a more sensitive technique like SNP typing for distinguishing them further.

jrober84 avatar May 22 '20 17:05 jrober84

Dear Phil, although not exactly what you need in terms of distance matrix to all database entries, but you can try out our previous version (2.1.0) of MOB-Suite with plasmid host-range phylogenetic tree reconstruction feature. It will build a phylo tree based on plasmid features (replicon and cluster id) and overlay it against all plasmid sequences and corresponding taxonomy information in our database.

Thank you for feature suggestion.

$mob_typer -i plasmid.fasta -o mob-typer --host_range_detailed

kbessonov1984 avatar May 22 '20 17:05 kbessonov1984

Thanks both!

flashton2003 avatar May 28 '20 12:05 flashton2003

I am thinking to create a series of single-linkage flat clusters based on mash distances, between your input and the reference database and provide basic summary statistics on the average pairwise distance within the primary mob_cluster. This will constrain the number of samples and make the comparisons sensible.

jrober84 avatar May 26 '22 17:05 jrober84