mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

change nucleicacids results dict

Open orbeckst opened this issue 3 years ago • 1 comments
trafficstars

Is your feature request related to a problem?

The new experimental nucleicacids analysis module contains the https://github.com/MDAnalysis/mdanalysis/blob/c01d82bb86b480c8b2ae87f36497e7fc0a726c84/package/MDAnalysis/analysis/nucleicacids.py#L61 class as its base class. It presents results in a Results dict. The content is different from what other AnalysisBase-based classes use in that

  • it contains "times" which duplicates the .times attribute coming from AnalysisBase
  • has integers as additional keys where "first index is selection, second index is time"

Specifically, for each of the selections, a dict is stored https://github.com/MDAnalysis/mdanalysis/blob/c01d82bb86b480c8b2ae87f36497e7fc0a726c84/package/MDAnalysis/analysis/nucleicacids.py#L134-L135 and res_dict comes from https://github.com/MDAnalysis/mdanalysis/blob/c01d82bb86b480c8b2ae87f36497e7fc0a726c84/package/MDAnalysis/analysis/nucleicacids.py#L128-L129 where dist is an array of distances.

The "selections" are the individual base pairs so in order to query the distance of base pair j at time frame t I would use results[j][t].

Please correct me, @ALescoulie , if I misrepresented anything here.

Describe the solution you'd like

@IAlibay 's suggestion (from https://github.com/MDAnalysis/mdanalysis/pull/3735#pullrequestreview-1020755173) :

would it make sense to consider switching this to instead be results.pair_distances (maybe even making it a pairs x times 2D ndarray?

Additionally, I suggest to transpose so that the first axis corresponds to time frame and the second axis to pair index:

results.pair_distances[:, j]    # time series of pair j
results.pair_distances[-1]      # all distances at the last time frame

The "one observation per row" layout is canonical and also used e.g. in the RMSD results.

Describe alternatives you've considered

Don't change anything... but this makes this module behave and feel different from the others and tools such as mdacli might have a hard time working with it — comments by @PicoCentauri and @joaomcteixeira especially welcome on how to best structure results.

Additional context

  • https://github.com/MDAnalysis/mdanalysis/pull/3735#pullrequestreview-1020755173
  • PR #3611 where nucleicacids was implemented
  • I am willing to ignore semver here and change Results and anything else that needs changing in a breaking manner between 2.2 and 2.3, based on the fact that the module is in its infancy and is effectively experimental and likely has not had much uptake yet, so we're not going to hurt many users. It's much more important to define the API well (and not be bogged down with deprecations and crud code).

orbeckst avatar Jun 30 '22 23:06 orbeckst

I'm moving this to target 2.4.0 instead.

IAlibay avatar Aug 27 '22 19:08 IAlibay