topologyattrs._Connection.get_atoms reinitializes the TopologyGroup every time
Expected behavior
One would expect that when a universe is loaded, the TopologyGroups would be initialized once.
Actual behavior
Every time topologyattrs._Connection.get_atoms is called, it reinitializes the TopologyGroup. This occurs every time AtomGroup.bonds is called.
This slows down calling the bond types significantly. This relates to the work around used in MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis.get_donors() where line 503 states:
# We're using u._topology.bonds rather than u.bonds as it is a million
# times faster to access. This is because u.bonds also calculates
# properties of each bond (e.g bond length). See:
# https://github.com/MDAnalysis/mdanalysis/issues/2396#issuecomment-596251787
This is also the case in MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis._get_dh_pairs()
Code to reproduce the behavior
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB
from MDAnalysis.tests.datafiles import PSF, DCD
universe = mda.Universe(PSF, DCD)
print("ID {} should equal ID {}".format(id(universe.atoms.bonds),id(universe.atoms.bonds)))
print("ID {} should equal ID {}".format(id(universe.bonds),id(universe.bonds)))
Current version of MDAnalysis
- Which version are you using? '2.5.0-dev0'
- Which version of Python (
python -V)? 3.9.13 - Which operating system? MacOS
Something like a cache would be easy to add and would be a great QoL improvement.
Thanks for raising this crucial performance issue and for providing such clear context, especially the link to the existing workaround in HydrogenBondAnalysis! You've correctly identified the root cause: the lack of caching in the public accessor (likely topologyattrs._Connection.get_atoms) means the TopologyGroup is being unnecessarily reinitialized every time .bonds is called. This is a significant design flaw that must be fixed before the 3.0 release. The correct solution is to implement lazy initialization and proper caching for all topology connection properties (bonds, angles, dihedrals) on the $\text{Universe}$ and $\text{AtomGroup}$ objects. This will ensure they are only instantiated once, making the public $\text{API}$ access fast and eliminating the need for internal workarounds.I can investigate implementing this caching mechanism for the 3.0 branch to fix this major bottleneck