rmsd
rmsd copied to clipboard
Strange RMSD values
I calculate the RMSD value for two xyz files (opt.xyz and sp.xyz) with all possible methods, and the smallest value I take for a specific example is 2.1. Then I run ''calculate_rmsd -p opt.xyz sp.xyz'' and take the modified sp coordinates (sp_modified.xyz). When calculating the rmsd between opt.xyz and sp_modified.xyz, I take a value of 0.4.
Why is that? Shouldn't the calculation produce the same results?
Hi @livaschar , this is unwanted but expected behavior. It is because of the way the reordering is working. Trying to match atoms with atoms can be very approximate. Especially when the atom environments are very similar.
For example, if you try
calculate_rmsd --reorder --reorder-method hungarian sp.xyz sp_modified.xyz # 1.98
calculate_rmsd --reorder --reorder-method distance sp.xyz sp_modified.xyz # zero
calculate_rmsd --reorder --reorder-method hungarian sp.xyz sp.xyz # zero
you will see when you try to reorder, it is based on distance between atom pairs, and might not give the correct match.
Unfortunately, the Hungarian algorithm requires that the structures are already aligned to be a good fit. Which in your case happens after your initial alignment, which is why you see a good fit here.
calculate_rmsd --reorder opt.xyz sp_modified.xyz # 0.47
There is a missing --reorder-method based on a atom descriptor model called --reorder-method qml which might solve your issue and I am working on getting that python package up again.
But in essence your problem is you are trying to match your atoms on a very symmetric molecule, which is super hard to do. You might want to checkout rdkit for a chem-informatics approach, as this package does not use bonds to align.
Thanks for your quick reply, @charnley! The thing that I don't fully get is what the coordinates of the 'calculate_rmsd -p' command are. Which method is used to reorder the molecule? Hungarian, distance, etc., or is it something else and I got it wrong?
I will appreciate it if you notify us when qml is up and running. Thank you in advance.
@livaschar The default reorder algorithm is the Hungarian one, see for instance README.rst. Flag -p and the more verbose --print report the coordinates of the then already aligned structure B. I don't know if one can access the atomic coordinates after the resort yet prior the alignment because so far this wasn't my interest while using the script.
@nbehrnd so if the coordinates printed by '-p' flag are the ones coming from the Hungarian algorithm, shouldn't the two cases
calculate_rmsd --reorder --reorder-method hungarian opt.xyz sp.xyz
calculate_rmsd --reorder --reorder-method hungarian opt.xyz sp_modified.xyz
produce the same results, since the 'sp_modified.xyz' is just the final modified version of the 'sp.xyz'?
@livaschar If I compare the structures sp.xyz and sp_modified.xyz as such outside the script with Jmol at the level of .xyz files -- only the atomic positions -- the two structures can be superimposed nicely. For Jmol's tolerances and eventual metric, the RMSD drops from initial 4.68 to 0.0. This supports your argument the results of a comparison of opt.xyz vs sp.xyz and of a comparison opt.xyz vs sp_modified.xzy should yield very similar (if not identical, ideally) RMSD.
As mentioned by @charnley, the restrain to use atomic positions (only) to describe a structure and highly symmetrical structures to compare with each other show a limitation of an algorithm / the implementation of an algorithm. For me, calculate_rmsd.py remains a valuable tool (or: how often does one compare two structures in C2 symmetry, or higher?).