mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

AuxReader for EDR Files

Open BFedder opened this issue 2 years ago • 7 comments

Fixes #3629

With ~~panedrlite now closer to~~ pyedr now being a thing, and the actual code itself less likely to still change, I have now finally started work on the EDR auxiliary reader that I want to implement as part of my GSoC project. Currently, this is just a skeleton, but it already allows EDR data to be read and assigned to AuxStep instances.

To Do:

  • [x] Finish implementation, allow similar features as XVGReader
  • [x] Write tests
  • [x] Improve documentation (currently barely there)
  • [x] Write self-contained example in documentation
  • [x] Change the add_auxiliary method to accommodate EDR data

PR Checklist

  • [x] Tests?
  • [x] Docs?
  • [x] CHANGELOG updated?
  • [x] Issue raised/referenced?

BFedder avatar Jul 09 '22 11:07 BFedder

Hello @BFedder! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 171:80: E501 line too long (88 > 79 characters) Line 174:80: E501 line too long (86 > 79 characters) Line 175:80: E501 line too long (91 > 79 characters)

Line 56:80: E501 line too long (91 > 79 characters) Line 57:80: E501 line too long (91 > 79 characters) Line 528:1: E402 module level import not at top of file

Comment last updated at 2022-09-12 10:12:49 UTC

pep8speaks avatar Jul 09 '22 11:07 pep8speaks

Codecov Report

Base: 94.31% // Head: 94.33% // Increases project coverage by +0.02% :tada:

Coverage data is based on head (828b2dc) compared to base (93c0aac). Patch coverage: 99.35% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3749      +/-   ##
===========================================
+ Coverage    94.31%   94.33%   +0.02%     
===========================================
  Files          192      193       +1     
  Lines        24781    24975     +194     
  Branches      3341     3369      +28     
===========================================
+ Hits         23371    23561     +190     
- Misses        1362     1365       +3     
- Partials        48       49       +1     
Impacted Files Coverage Δ
package/MDAnalysis/auxiliary/EDR.py 98.90% <98.90%> (ø)
package/MDAnalysis/auxiliary/XVG.py 95.61% <100.00%> (+0.07%) :arrow_up:
package/MDAnalysis/auxiliary/__init__.py 100.00% <100.00%> (ø)
package/MDAnalysis/auxiliary/base.py 94.01% <100.00%> (+2.69%) :arrow_up:
package/MDAnalysis/coordinates/base.py 94.37% <100.00%> (+0.38%) :arrow_up:
package/MDAnalysis/units.py 100.00% <100.00%> (ø)
package/MDAnalysis/lib/pkdtree.py 91.91% <0.00%> (-2.59%) :arrow_down:
MDAnalysisTests/auxiliary/base.py 89.74% <0.00%> (-1.58%) :arrow_down:
package/MDAnalysis/lib/mdamath.py 100.00% <0.00%> (ø)
MDAnalysisTests/auxiliary/__init__.py 100.00% <0.00%> (ø)
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Jul 20 '22 17:07 codecov[bot]

I have now adapted the tests for the trajectory-independent functionality. Next step will be the association of energy data to trajectory frames.

BFedder avatar Jul 24 '22 20:07 BFedder

Hi @hmacdope @orbeckst @IAlibay @fiona-naughton, could I ask for your thoughts on this now? :) I believe it is ready

BFedder avatar Aug 04 '22 16:08 BFedder

Thanks for your review IAlibay! I will upload the first bit of reworking in a bit, and think about the other points you raised over the weekend.

BFedder avatar Aug 05 '22 15:08 BFedder

Move attaching of data from coordinates/base to auxreaders themselves. add_auxiliary shouldn’t need to know which file format is actually attached. self.add_auxiliary(AuxReader, ...) — use callback pattern in the classes.

This is now implemented, but coordinates/base will require some cleanup still.

Simplify passing of auxnames and auxterms: always require a mapping auxnames => auxterms, possibly using a dict as input (or any mapping, really).

This is partially done: auxname and auxterm have been replaced by aux_spec, which should be either a string or a dictionary that maps auxname to auxterm. For now, a string is valid here for XVGReaders so they behave as previously.

Use None as the default data selector (instead of "*")

This turns out to be a bit problematic, see the Discord GSoC channel. In a nutshell, it appears we have to either break XVGReader, be unpythonic, or use a confusing syntax were explicitly adding None causes everything to be added.

BFedder avatar Aug 13 '22 18:08 BFedder

Ahhh, I only ran the test in MDAnalysisTests/auxiliary, not the entire testsuite before committing... And here we are. I'll fix this tomorrow

BFedder avatar Aug 13 '22 18:08 BFedder

The API is now reworked as discussed, and most of your comments should be addressed now as well. Could I ask you to have another look please @hmacdope @orbeckst @IAlibay @fiona-naughton? :)

BFedder avatar Aug 15 '22 17:08 BFedder

Your blog post was very useful in explaining the high level ideas. I wanted to know what happens when auxdata=None is provided? Does it ever make sense to do this, i.e., to create an empty AuxReader?

If it doesn't make sense then should we plan on changing the API for 3.0 to make auxdata the required first argument? If so, adding comments to the code and raising an issue would be good so that we don't forget. What do people think?

orbeckst avatar Aug 23 '22 20:08 orbeckst

Makes sense to me @orbeckst

hmacdope avatar Aug 23 '22 22:08 hmacdope

Thanks, @orbeckst! I don't think it would ever make sense to provide auxdata=None, I cannot think of a use case. So I think changing the order in 3.0 is a good idea.

BFedder avatar Aug 24 '22 13:08 BFedder

@BFedder, thanks for addressing everything so quickly! Well done.

Final thing to go is just discussion around units.

hmacdope avatar Aug 26 '22 00:08 hmacdope

This new implementation now checks what units the data are in on creation of an EDRReader instance. If the units are already defined in MDAnalysis.units (pressure isn't, for example), then the data will be converted to MDA base units. This depends on a new release of pyedr, though.

BFedder avatar Sep 01 '22 16:09 BFedder

Unit handling is now part of this PR. EDRReader holds a unit_dict that stores the units of each term as read from pyedr. During initialisation, the reader attempts to convert to MDA base units. A warning is raised if it fails to do so (as is currently the case for pressures or surface tension, for example). On conversion, the entry in unit_dict is also updated.

BFedder avatar Sep 02 '22 13:09 BFedder

Also if you could resolve CHANGELOG stuff would be awesome.

hmacdope avatar Sep 04 '22 23:09 hmacdope

This should have addressed your reviews now, @orbeckst @hmacdope - thanks!

However, the mypy check is failing now and I don't understand why. Mamba raises the runtime error "RuntimeError: Found incorrect download: cudatoolkit. Aborting"

SHA256 mismatch seems to be the reason: SHA256 sum validation error. ERROR File not valid: SHA256 sum doesn't match expectation "/home/runner/conda_pkgs_dir/cudatoolkit-11.7.0-hd8887f6_10.tar.bz2" Expected: 31bad17839c35278203afad9c18ed15d7dd7dece11756aecf83368e1cbd9c850 Actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

I'm not sure how that would have happened as a result of my changes. What can I do to fix this?

BFedder avatar Sep 06 '22 16:09 BFedder

Thank you! That's addressed now as well

BFedder avatar Sep 06 '22 20:09 BFedder

@BFedder there are a bunch of PEP8 issues https://github.com/MDAnalysis/mdanalysis/pull/3749#issuecomment-1179526826 — if you could address them then that will avoid that future developers (including future you) have to deal with them.

orbeckst avatar Sep 06 '22 23:09 orbeckst

The remaining PEP8 complaints are URLs, a table, and the import done to match what's already present.

BFedder avatar Sep 07 '22 08:09 BFedder

@IAlibay you happy with moving forward with the PEP8 the way it is?

hmacdope avatar Sep 08 '22 23:09 hmacdope

@IAlibay you happy with moving forward with the PEP8 the way it is?

I'll try to give it a read through tomorrow, but technically I'm on holiday so no promises (I'm on "pressing matters or stuff that I want to play with" only atm)

IAlibay avatar Sep 09 '22 01:09 IAlibay

First review set - base maintenance stuff.

Pyedr should remain an optional dependency for now.

I see that it is being added to setup.py... honestly I don't know, here @orbeckst @hmacdope can you please run this through the business channel? I've become increasingly reluctant to add new dependencies to core (indeed my plan is to remove half the ones we already have by 3.0). I'll agree to whatever the majority agreement is.

edit: I should note, if we make it a core requirement, that means we need to ensure that we either ship it with very small test files, or we don't ship the test files at all. The ~ 20 MB tarball with the tests is not appropriate for the core library. That also needs to be done before 2.4.0 is released.

IAlibay avatar Sep 09 '22 12:09 IAlibay

Thanks for the reviews, this should address them @IAlibay @orbeckst

BFedder avatar Sep 12 '22 10:09 BFedder

I'm not sure what to do about the Cirrus CI failing, the error message seems like it's not due to any change I made... Could I ask for advice, please?

BFedder avatar Sep 12 '22 11:09 BFedder

Ping @tylerjereddy? RE cirrus? I'm not so sure either.

hmacdope avatar Sep 13 '22 12:09 hmacdope

I've restarted the task, as Tyler previous detailed somewhere else, the issue seems to be purely down to Cirrus getting hammered pretty heavily by a bunch of folks that suddenly realised they had free M1 runner available. Hopefully it should complete now.

IAlibay avatar Sep 13 '22 14:09 IAlibay

Looks good now, feel free to deactivate Cirrus if it becomes annoying of course.

tylerjereddy avatar Sep 13 '22 15:09 tylerjereddy

I have now used a defaultdict in calc_representative, moved the comments in test_edr into class docstrings, and opened issue #3830 as a place to discuss the future of calc_representative with averages.

BFedder avatar Sep 14 '22 15:09 BFedder

@IAlibay @orbeckst are we satisfied that pyedr is now an optional dependency and that your review questions have been addressed? I had a read over and am satisfied. :)

hmacdope avatar Sep 18 '22 23:09 hmacdope

This latest batch of reviews is now also addressed, and the checks pass. I believe the EDRReader is now ready to be merged, then :partying_face: Thanks everyone for your mentorship, reviews, and help over these past few months! I have learnt a lot and am looking forward to continue contributing to MDAnalysis.

BFedder avatar Sep 19 '22 10:09 BFedder