mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Collective analysis of different analysis classes on the same trajectory

Open PicoCentauri opened this issue 2 years ago • 6 comments

Disclaimer: I thought there was an issue in the past with this idea but I was not able to find it. Sorry if the following idea is a duplication

Is your feature request related to a problem?

If one wants to run several analyses on the same trajectory one has to loop through the trajectory several times. For example, one naive way of analyzing two RDFs is the following:

import MDAnalysis as mda
from MDAnalysis.analysis.rdf import InterRDF
from MDAnalysis.analysis.base import AnalysisCollection

# Load simulation results
u = mda.Universe('topol.tpr','traj.trr')

# Select atoms
O = u.select_atoms('name O')
H = u.select_atoms('name H')

# Create individual analysis objects
rdf_OO = InterRDF(O,O)
rdf_OH = InterRDF(O,H)

# Run the individual analysis
rdf_OO.run(start=0, end=100, step=10)
rdf_OH.run(start=0, end=100, step=10)

# Results are stored in indivial objects
print(rdf_OO.results)
print(rdf_OH.results)

I know that there is the InterRDF_s class which exactly does this job. But, the problem of a collective analysis loop holds for any analysis class and I only chose the RDF since it a straightforward example. In addition, a collected Analysis could make InterRDF_s superfluous.

Describe the solution you'd like

I would introduce an AnalysisCollection(*analysis_object) class. When creating an object one can register several analysis_objects as arguments. The run method of AnalysisCollection loops through the trajectory and calls the single_frame method of each analysis_object in every frame. Results are stored in each method. The code from above will simplify to

import MDAnalysis as mda
from MDAnalysis.analysis.rdf import InterRDF
from MDAnalysis.analysis.base import AnalysisCollection

# Load simulation results
u = mda.Universe('topol.tpr','traj.trr')

# Select atoms
O = u.select_atoms('name O')
H = u.select_atoms('name H')

# Create individual analysis objects
rdf_OO = InterRDF(O,O)
rdf_OH = InterRDF(O,H)

# Create collection for common trajectory loop
collection = AnalysisCollection(rdf_OO, rdf_OH)

# Run the collected analysis
collection.run(start=0, end=100, step=10)

# Results are stored in indivial objects
print(rdf_OO.results)
print(rdf_OH.results)

I think this is approach is fairly easy to implement since the managing of the data remains in the name space of each analysis_object .

Describe alternatives you've considered

Keep everything as it is.

PicoCentauri avatar Mar 21 '22 14:03 PicoCentauri

We had a discussion about this some time ago, I thought I had a PR open about doing an analysis collection somewhere but I might have closed it or just never gotten around to it.

I can re-open it if we want to come back to this.

IAlibay avatar Mar 21 '22 14:03 IAlibay

@MDAnalysis/coredevs are there any opinions or ideas? I will work on this if we want to have this and decide on a strategy.

PicoCentauri avatar May 16 '22 09:05 PicoCentauri

I was going to open that PR but I'm swamped with 2.2.0 edit: so I guess at this point I'm going to have to give up on it..

I'd appreciate it if we did that (releasing 2.2.0 and all the associated work around it) as a priority.

IAlibay avatar May 16 '22 09:05 IAlibay

@PicoCentauri Sorry I missed this the first time round. This is a great idea, and I am anticipating that this pattern of only looping through the trajectory once will be much faster than two loops. Problems I can foresee:

  • this assumes that each Analysis class has the "base" run method (can check this through inspection)
  • this assumes that an analysis method doesn't alter the Timestep (or Universe). This is harder to check, but you could copy Timestep and "restore" it between different Analysis classes' _single_frame. E.g. if AnalysisClass 1 rotates the system as part of its analysis, then AnalysisClass2 is getting a different input than expected.

I think somewhere we talked about having an __add__ method even for combining classes (this seems like a bad idea now), but maybe going back through the original design discussions of AnalysisBase would be useful.

richardjgowers avatar May 16 '22 09:05 richardjgowers

No problem! Your second point is indeed something I haven't thought about. However, altering the timestep could be something one wants to do. E.g. if AnalysisClass 1 unwraps the systems, which could be a costly operation, AnalysisClass 2 does not have to do it again.

I am in favor of not having an __add__ method. The AnalysisBase is clean and looks more or less structured right now.

PicoCentauri avatar May 16 '22 10:05 PicoCentauri

@IAlibay if you still have your code it would be nice starting point.

PicoCentauri avatar May 17 '22 16:05 PicoCentauri