mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

define method calc_representative() for averages of auxiliary data in the AUXReader

Open BFedder opened this issue 2 years ago • 4 comments

Is your feature request related to a problem?

The AuxReaders have a calc_representative method which returns a representative value of the auxiliary data for any given time step. It supports two modes, either returning the auxiliary data of the AuxStep closest to the trajectory time step, or an average of AuxSteps within a certain cutoff around the trajectory time step.

With the implementation of an AuxReader for EDR files in PR #3749, a new way to store auxiliary data within the readers was needed. Previously, data was stored in plain NumPy arrays (for example for the XVG reader). Now, a dictionary of NumPy arrays is also possible. These two structures require different treatment to obtain average values. This issue is opened to facilitate the discussion on where the relevant method(s) should be defined.

Possible Solutions

Current implementation: Define method in auxiliary.base.

In this solution, calc_representative handles both cases correctly via a try/except statement to distinguish between the plain NumPy array and the dictionary of arrays. It works, but it is somewhat inelegant. @orbeckst and @IAlibay have rightfully pointed out that while this solution is fine if there is only two internal data structures, the solution will quickly become unmanageable if and when more cases need to be handled.

Treat plain NumPy array as base case, override in EDRReader

In an alternative solution, the calc_representative method would be reverted to its state pre-EDRReader. It fully supports the XVGReader and other potential readers that store data in plain NumPy arrays. Other readers like the EDRReader would need to override this method to work properly.

Force implementation in each AuxReader, raise NotImplementedError in base class

Finally, the methods for calculating average values could also be a requirement in each Reader and moved out of the base class completely.

Additional context

For more context see #3749, in particular orbeckst's comment

BFedder avatar Sep 14 '22 15:09 BFedder

The situation might become clearer once we have more aux readers. I think we should be guided by reducing code duplication, having clear code paths, and maintainability.

Currently I’m leaning towards having the simple array method in base and anything more complicated should provide its own. However, this might come with a cost for developers who might think that they can only use the simple array-type implementation.

orbeckst avatar Sep 16 '22 00:09 orbeckst

My instinct is to have the default behaviour be a "NumPy-like" behaviour in a base class with an override in each AUXReader for each specific case.

However I totally agree with @orbeckst that this will become clearer as more AuxReaders are written. I am ok with it being the way it is for now.

hmacdope avatar Sep 18 '22 23:09 hmacdope

Because adding new readers is probably going to be a slow process I'd just go for the NotImplementedError in the short term and review it in the long term.

This avoids semver level issues where you make an early decision and then a year in realise it's sub-optimal and have to go back and fix things in the next major version.

IAlibay avatar Sep 18 '22 23:09 IAlibay

I am swayed by https://github.com/MDAnalysis/mdanalysis/issues/3830#issuecomment-1250411497 : under semantic versioning it's hard to remove a method from a base class. Let's start out with each AUXReader implementing their own and the base class raising NotImplementedError.

orbeckst avatar Sep 22 '22 16:09 orbeckst

i believe that The AuxReaders module in MDAnalysis has a method called calc_representative() which returns a representative value of the auxiliary data for any given time step. [It supports two modes, either returning the auxiliary data of the AuxStep closest to the trajectory time step, or an average of AuxSteps within a certain cutoff around the trajectory time step]

Priyanshuken18 avatar May 03 '23 19:05 Priyanshuken18