mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

topologyattrs._Connection.get_atoms reinitializes the TopologyGroup every time

Open jaclark5 opened this issue 2 years ago • 2 comments

Expected behavior

One would expect that when a universe is loaded, the TopologyGroups would be initialized once.

Actual behavior

Every time topologyattrs._Connection.get_atoms is called, it reinitializes the TopologyGroup. This occurs every time AtomGroup.bonds is called. This slows down calling the bond types significantly. This relates to the work around used in MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis.get_donors() where line 503 states:

        # We're using u._topology.bonds rather than u.bonds as it is a million
        # times faster to access. This is because u.bonds also calculates
        # properties of each bond (e.g bond length). See:
        # https://github.com/MDAnalysis/mdanalysis/issues/2396#issuecomment-596251787

This is also the case in MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis._get_dh_pairs()

Code to reproduce the behavior

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB
from MDAnalysis.tests.datafiles import PSF, DCD
universe = mda.Universe(PSF, DCD)
print("ID {} should equal ID {}".format(id(universe.atoms.bonds),id(universe.atoms.bonds)))
print("ID {} should equal ID {}".format(id(universe.bonds),id(universe.bonds)))

Current version of MDAnalysis

  • Which version are you using? '2.5.0-dev0'
  • Which version of Python (python -V)? 3.9.13
  • Which operating system? MacOS

jaclark5 avatar May 10 '23 12:05 jaclark5

Something like a cache would be easy to add and would be a great QoL improvement.

richardjgowers avatar May 10 '23 18:05 richardjgowers

Thanks for raising this crucial performance issue and for providing such clear context, especially the link to the existing workaround in HydrogenBondAnalysis! You've correctly identified the root cause: the lack of caching in the public accessor (likely topologyattrs._Connection.get_atoms) means the TopologyGroup is being unnecessarily reinitialized every time .bonds is called. This is a significant design flaw that must be fixed before the 3.0 release. The correct solution is to implement lazy initialization and proper caching for all topology connection properties (bonds, angles, dihedrals) on the $\text{Universe}$ and $\text{AtomGroup}$ objects. This will ensure they are only instantiated once, making the public $\text{API}$ access fast and eliminating the need for internal workarounds.I can investigate implementing this caching mechanism for the 3.0 branch to fix this major bottleneck

ChinmayRout9040895625 avatar Oct 25 '25 17:10 ChinmayRout9040895625