mdanalysis
mdanalysis copied to clipboard
AuxReader for EDR Files
Fixes #3629
With ~~panedrlite now closer to~~ pyedr now being a thing, and the actual code itself less likely to still change, I have now finally started work on the EDR auxiliary reader that I want to implement as part of my GSoC project. Currently, this is just a skeleton, but it already allows EDR data to be read and assigned to AuxStep instances.
To Do:
- [x] Finish implementation, allow similar features as XVGReader
- [x] Write tests
- [x] Improve documentation (currently barely there)
- [x] Write self-contained example in documentation
- [x] Change the add_auxiliary method to accommodate EDR data
PR Checklist
- [x] Tests?
- [x] Docs?
- [x] CHANGELOG updated?
- [x] Issue raised/referenced?
Hello @BFedder! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
- In the file
package/MDAnalysis/auxiliary/EDR.py
:
Line 171:80: E501 line too long (88 > 79 characters) Line 174:80: E501 line too long (86 > 79 characters) Line 175:80: E501 line too long (91 > 79 characters)
- In the file
package/MDAnalysis/auxiliary/__init__.py
:
Line 56:80: E501 line too long (91 > 79 characters) Line 57:80: E501 line too long (91 > 79 characters) Line 528:1: E402 module level import not at top of file
Comment last updated at 2022-09-12 10:12:49 UTC
Codecov Report
Base: 94.31% // Head: 94.33% // Increases project coverage by +0.02%
:tada:
Coverage data is based on head (
828b2dc
) compared to base (93c0aac
). Patch coverage: 99.35% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## develop #3749 +/- ##
===========================================
+ Coverage 94.31% 94.33% +0.02%
===========================================
Files 192 193 +1
Lines 24781 24975 +194
Branches 3341 3369 +28
===========================================
+ Hits 23371 23561 +190
- Misses 1362 1365 +3
- Partials 48 49 +1
Impacted Files | Coverage Δ | |
---|---|---|
package/MDAnalysis/auxiliary/EDR.py | 98.90% <98.90%> (ø) |
|
package/MDAnalysis/auxiliary/XVG.py | 95.61% <100.00%> (+0.07%) |
:arrow_up: |
package/MDAnalysis/auxiliary/__init__.py | 100.00% <100.00%> (ø) |
|
package/MDAnalysis/auxiliary/base.py | 94.01% <100.00%> (+2.69%) |
:arrow_up: |
package/MDAnalysis/coordinates/base.py | 94.37% <100.00%> (+0.38%) |
:arrow_up: |
package/MDAnalysis/units.py | 100.00% <100.00%> (ø) |
|
package/MDAnalysis/lib/pkdtree.py | 91.91% <0.00%> (-2.59%) |
:arrow_down: |
MDAnalysisTests/auxiliary/base.py | 89.74% <0.00%> (-1.58%) |
:arrow_down: |
package/MDAnalysis/lib/mdamath.py | 100.00% <0.00%> (ø) |
|
MDAnalysisTests/auxiliary/__init__.py | 100.00% <0.00%> (ø) |
|
... and 4 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
I have now adapted the tests for the trajectory-independent functionality. Next step will be the association of energy data to trajectory frames.
Hi @hmacdope @orbeckst @IAlibay @fiona-naughton, could I ask for your thoughts on this now? :) I believe it is ready
Thanks for your review IAlibay! I will upload the first bit of reworking in a bit, and think about the other points you raised over the weekend.
Move attaching of data from coordinates/base to auxreaders themselves. add_auxiliary shouldn’t need to know which file format is actually attached. self.add_auxiliary(AuxReader, ...) — use callback pattern in the classes.
This is now implemented, but coordinates/base will require some cleanup still.
Simplify passing of auxnames and auxterms: always require a mapping auxnames => auxterms, possibly using a dict as input (or any mapping, really).
This is partially done: auxname
and auxterm
have been replaced by aux_spec
, which should be either a string or a dictionary that maps auxname
to auxterm
. For now, a string is valid here for XVGReaders so they behave as previously.
Use None as the default data selector (instead of "*")
This turns out to be a bit problematic, see the Discord GSoC channel. In a nutshell, it appears we have to either break XVGReader, be unpythonic, or use a confusing syntax were explicitly adding None causes everything to be added.
Ahhh, I only ran the test in MDAnalysisTests/auxiliary, not the entire testsuite before committing... And here we are. I'll fix this tomorrow
The API is now reworked as discussed, and most of your comments should be addressed now as well. Could I ask you to have another look please @hmacdope @orbeckst @IAlibay @fiona-naughton? :)
Your blog post was very useful in explaining the high level ideas. I wanted to know what happens when auxdata=None
is provided? Does it ever make sense to do this, i.e., to create an empty AuxReader?
If it doesn't make sense then should we plan on changing the API for 3.0 to make auxdata
the required first argument? If so, adding comments to the code and raising an issue would be good so that we don't forget. What do people think?
Makes sense to me @orbeckst
Thanks, @orbeckst!
I don't think it would ever make sense to provide auxdata=None
, I cannot think of a use case. So I think changing the order in 3.0 is a good idea.
@BFedder, thanks for addressing everything so quickly! Well done.
Final thing to go is just discussion around units.
This new implementation now checks what units the data are in on creation of an EDRReader instance. If the units are already defined in MDAnalysis.units (pressure isn't, for example), then the data will be converted to MDA base units. This depends on a new release of pyedr, though.
Unit handling is now part of this PR. EDRReader
holds a unit_dict
that stores the units of each term as read from pyedr
. During initialisation, the reader attempts to convert to MDA base units. A warning is raised if it fails to do so (as is currently the case for pressures or surface tension, for example). On conversion, the entry in unit_dict
is also updated.
Also if you could resolve CHANGELOG stuff would be awesome.
This should have addressed your reviews now, @orbeckst @hmacdope - thanks!
However, the mypy check is failing now and I don't understand why. Mamba raises the runtime error "RuntimeError: Found incorrect download: cudatoolkit. Aborting"
SHA256 mismatch seems to be the reason: SHA256 sum validation error. ERROR File not valid: SHA256 sum doesn't match expectation "/home/runner/conda_pkgs_dir/cudatoolkit-11.7.0-hd8887f6_10.tar.bz2" Expected: 31bad17839c35278203afad9c18ed15d7dd7dece11756aecf83368e1cbd9c850 Actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
I'm not sure how that would have happened as a result of my changes. What can I do to fix this?
Thank you! That's addressed now as well
@BFedder there are a bunch of PEP8 issues https://github.com/MDAnalysis/mdanalysis/pull/3749#issuecomment-1179526826 — if you could address them then that will avoid that future developers (including future you) have to deal with them.
The remaining PEP8 complaints are URLs, a table, and the import done to match what's already present.
@IAlibay you happy with moving forward with the PEP8 the way it is?
@IAlibay you happy with moving forward with the PEP8 the way it is?
I'll try to give it a read through tomorrow, but technically I'm on holiday so no promises (I'm on "pressing matters or stuff that I want to play with" only atm)
First review set - base maintenance stuff.
Pyedr should remain an optional dependency for now.
I see that it is being added to setup.py... honestly I don't know, here @orbeckst @hmacdope can you please run this through the business channel? I've become increasingly reluctant to add new dependencies to core (indeed my plan is to remove half the ones we already have by 3.0). I'll agree to whatever the majority agreement is.
edit: I should note, if we make it a core requirement, that means we need to ensure that we either ship it with very small test files, or we don't ship the test files at all. The ~ 20 MB tarball with the tests is not appropriate for the core library. That also needs to be done before 2.4.0 is released.
Thanks for the reviews, this should address them @IAlibay @orbeckst
I'm not sure what to do about the Cirrus CI failing, the error message seems like it's not due to any change I made... Could I ask for advice, please?
Ping @tylerjereddy? RE cirrus? I'm not so sure either.
I've restarted the task, as Tyler previous detailed somewhere else, the issue seems to be purely down to Cirrus getting hammered pretty heavily by a bunch of folks that suddenly realised they had free M1 runner available. Hopefully it should complete now.
Looks good now, feel free to deactivate Cirrus if it becomes annoying of course.
I have now used a defaultdict in calc_representative
, moved the comments in test_edr into class docstrings, and opened issue #3830 as a place to discuss the future of calc_representative
with averages.
@IAlibay @orbeckst are we satisfied that pyedr is now an optional dependency and that your review questions have been addressed? I had a read over and am satisfied. :)
This latest batch of reviews is now also addressed, and the checks pass. I believe the EDRReader is now ready to be merged, then :partying_face: Thanks everyone for your mentorship, reviews, and help over these past few months! I have learnt a lot and am looking forward to continue contributing to MDAnalysis.