cpptraj icon indicating copy to clipboard operation
cpptraj copied to clipboard

Adding Clustering via Extended Similarity Metrics

Open drroe opened this issue 2 years ago • 5 comments

In collaboration with @ramirandaq @lexin-chen, expand the cluster analysis capabilities of cpptraj by adding clustering via extended similarity metrics (and more).

Some background reading:

https://link.springer.com/article/10.1186/s13321-021-00505-3

https://link.springer.com/article/10.1007/s10822-022-00444-7

drroe avatar Sep 13 '23 14:09 drroe

Here https://github.com/Amber-MD/cpptraj/pull/1051#event-10450499445 it says "Calculate extended comparison similarity values for each trajectory frame." Is this the complementary similarity used to then find medoids and outliers in the trajectory?

ramirandaq avatar Sep 22 '23 20:09 ramirandaq

Is this the complementary similarity used to then find medoids and outliers in the trajectory?

Yes - it's equivalent to the gen_sim_dict routine from src/tools/esim_modules.py in MDANCE.

drroe avatar Sep 25 '23 14:09 drroe

gen_sim_dict will take as an input a set of frames/conformations, and output a number (the extended similarity) for the whole set, not a number for every frame. To calculate the outliers and medoids, the function is calculate_comp_sim (in src/tools/bts.py). The complementary similarity does assign a number to every frame in a set, which can be used to rank the frames from high- to low-density.

ramirandaq avatar Sep 25 '23 14:09 ramirandaq

gen_sim_dict will take as an input a set of frames/conformations, and output a number (the extended similarity) for the whole set, not a number for every frame.

Yes, I understand that. Let me be more clear.

The ExtendedSimilarity::Comparison() function is most like gen_sim_dict. The ExtendedSimilarity::CalculateCompSim() function (which is what the extendedcomp command, Exec_ExtendedComparison class) is using under the hood is more like calculate_comp_sim. Let me know if you have any more questions.

drroe avatar Sep 25 '23 15:09 drroe

Sounds great! The functionality in bts.py is a bit more general, because it accommodates extended indices and MSD in a more general way, but this is perfect.

ramirandaq avatar Sep 25 '23 15:09 ramirandaq