mdanalysis
mdanalysis copied to clipboard
inconsistent storage of `times` and `frames` attributes in `AnalysisBase` classes
I'm pretty new to using AnalysisBase to write my own custom analysis classes, and I found it confusing that frames and times are handled differently in different classes.
Using the NewAnalysis example in the docs (https://docs.mdanalysis.org/stable/documentation_pages/analysis/base.html#MDAnalysis.analysis.base.AnalysisBase),
if you run NewAnalysis.run(), frames and times will be in NewAnalysis.frames and NewAnalysis.times.
If you use AnalysisFromFunction, then frames and times will be in results.frames and results.times, even though they both use AnalysisBase
Is this intended?
Expected behavior
I expected to find frames and times in results.frames and results.times in both cases
Actual behavior
results.frames throws AttributeError: 'Results' object has no attribute 'frames' when inheriting from AnalysisBase, but not when using AnalysisFromFunction
Current version of MDAnalysis
- Which version are you using? (run
python -c "import MDAnalysis as mda; print(mda.__version__)") 2.7.0 - Which version of Python (
python -V)? 3.11.9 - Which operating system? ubuntu 22.04
Hi @edisj, times and frames are not strictly speaking results, so they are not part of the results attribute in AnalysisBase, but they are their own attribute.
In the AnalysisFromFunction they are also their own attribute, that you can access as usual:
rog = AnalysisFromFunction(radgyr, u.trajectory,
protein, protein.masses,
total_mass=np.sum(protein.masses))
rog.run()
print(rog.times, rog.frames)
So the .times and .frames attributes are consistent between the two classes.
However, in the AnalysisFromFunction, these attributes appear to be added to the .results attribute as well:
https://github.com/MDAnalysis/mdanalysis/blob/347a0c0d31ee850d5a41bad5a49803bc9795f580/package/MDAnalysis/analysis/base.py#L539-L540
I agree this inconsistency might be confusing.
I expected to find frames and times in
results.framesandresults.timesin both cases
Since this is your expectation, would you find it better to have them in results also for AnalysisBase?
Hi @RMeli, thanks for the clarification! I ended up just adding them in my conclude method like this and it works exactly how I wanted
def _conclude(self):
self.results.frames = self.frames
self.results.times = self.times
Since this is your expectation, would you find it better to have them in results also for AnalysisBase?
Yes, personally I think .times and .frames should be part of the results
what I naturally think "times" and "frames" mean in the context of an analysis are the time values and frame numbers that correspond to the results of the analysis, so in some sense they are kind of like implicit results
The documentation from AnalysisFromFrunction:
Attributes
----------
results.frames : numpy.ndarray
simulation frames used in analysis
results.times : numpy.ndarray
simulation times used in analysis
is more in line with how I think it should behave
versus AnalysisBase:
Attributes
----------
times: numpy.ndarray
array of Timestep times. Only exists after calling
:meth:`AnalysisBase.run`
frames: numpy.ndarray
array of Timestep frame indices. Only exists after calling
:meth:`AnalysisBase.run`
What do you think?
I found it confusing that ~~AnalysisBase~~ AnalysisFromFunction added times and frames to results. I am very much used to the idea that any analysis class has times and frames as these are "not results" (even though it may not make a ton of sense for any class that aggregates such as RMSF, Density or similar).
Removing times and frames from AnalysisBase is a huge break. Adding them to AnalysisBase.results is easy but the redundancy feels dirty.
It would be interesting to hear other opinions.
@PicoCentauri you have looked at AnalysisBase a lot – any opinions?
@marinegor are there any implications for parallel analysis?
EDIT: I had confused AnalysisBase and AnalysisFromFunction
found it confusing that AnalysisBase added times and frames to results.
@orbeckst from my understanding it's the other way around? AnalysisBase doesn't have times and frames is results but AnalysisFromFunction does?
I think my 2 cents here is that AnalysisFromFunction is wrong - times and frames isn't a results
@marinegor are there any implications for parallel analysis?
Probably not, though if frames aren't numpy ndarrays, it could influence serialization.
But in general, I also oppose adding them to results since they aren't results, and were there even before the run started.
The majority opinion appears to be to deprecate results.times and results.frames as produced by AnalysisFromFunction and then remove for 3.0 for consistency.