mdanalysis
mdanalysis copied to clipboard
ENH: add TPR position and velocity read support
Fixes gh-464.
- Add support for reading positions and velocities from GMX
.tprfiles over the full range oftpxversions supported by regular.tprtopology parsing (58through137). We've had an issue open to add this capability for almost a decade now.
Notes for reviewers:
- Suggestions for removing the use of
globalto track TPR file precision are appreciated. - Performance/duplication considerations -- as Richard noted in the cognate issue, the way our topology and coordinate handling code is organized, it may make sense to keep the separation here, but there should at least be opportunity to reduce code duplication if not completely avoid re-reading the topology to seek to the coordinate positions); that said, our
.tprreading performance is atrocious anyway and my attempts to fix at gh-4098 have stalled for more than a year, so I suggest we defer performance (and probably even duplication) considerations until after the capability and tests are cemented in. - Please help me identify TPR testing scenarios that are missing, and ideally suggest the TPR file(s) I should use to test them.
- I'll probably deal with
CHANGELOGand docs changes when it looks like we're close to ready to merge, but I still appreciate suggestions on i.e., locations where doc changes will be needed.
PR Checklist
- [x] Tests?
- [ ] Docs?
- [ ] CHANGELOG updated?
- [ ] Issue raised/referenced?
Developers certificate of origin
- [X] I certify that this contribution is covered by the LGPLv2.1+ license as defined in our LICENSE and adheres to the Developer Certificate of Origin.
📚 Documentation preview 📚: https://mdanalysis--4873.org.readthedocs.build/en/4873/
Hello @tylerjereddy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
- In the file
package/MDAnalysis/coordinates/TPR.py:
Line 59:80: E501 line too long (89 > 79 characters) Line 61:80: E501 line too long (93 > 79 characters) Line 75:80: E501 line too long (81 > 79 characters)
- In the file
testsuite/MDAnalysisTests/coordinates/test_tpr.py:
Line 15:80: E501 line too long (119 > 79 characters) Line 16:23: E261 at least two spaces before inline comment Line 22:5: E124 closing bracket does not match visual indentation Line 24:23: E261 at least two spaces before inline comment Line 30:5: E124 closing bracket does not match visual indentation Line 32:16: E261 at least two spaces before inline comment Line 33:19: E241 multiple spaces after ',' Line 33:33: E241 multiple spaces after ',' Line 34:20: E241 multiple spaces after ',' Line 34:34: E241 multiple spaces after ',' Line 38:5: E124 closing bracket does not match visual indentation Line 40:21: E261 at least two spaces before inline comment Line 41:19: E241 multiple spaces after ',' Line 41:33: E241 multiple spaces after ',' Line 42:19: E241 multiple spaces after ',' Line 42:33: E241 multiple spaces after ',' Line 44:20: E241 multiple spaces after ',' Line 46:5: E124 closing bracket does not match visual indentation Line 47:14: E261 at least two spaces before inline comment Line 48:19: E241 multiple spaces after ',' Line 48:33: E241 multiple spaces after ',' Line 49:20: E241 multiple spaces after ',' Line 49:34: E241 multiple spaces after ',' Line 53:5: E124 closing bracket does not match visual indentation Line 54:14: E261 at least two spaces before inline comment Line 55:19: E241 multiple spaces after ',' Line 55:33: E241 multiple spaces after ',' Line 56:20: E241 multiple spaces after ',' Line 56:34: E241 multiple spaces after ',' Line 60:5: E124 closing bracket does not match visual indentation
Comment last updated at 2024-12-31 17:39:01 UTC
Codecov Report
Attention: Patch coverage is 94.44444% with 5 lines in your changes missing coverage. Please review.
Project coverage is 93.63%. Comparing base (
d412c9a) to head (006ccb3).
Additional details and impacted files
@@ Coverage Diff @@
## develop #4873 +/- ##
========================================
Coverage 93.63% 93.63%
========================================
Files 177 178 +1
Lines 22033 22121 +88
Branches 3115 3143 +28
========================================
+ Hits 20631 20714 +83
Misses 948 948
- Partials 454 459 +5
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@tylerjereddy since this PR is composed of two new files, and the modifications are all in one place, would you mind if I go ahead with #4849? I don't think there would be significant conflicts.
ok
Thanks @tylerjereddy. It is now merged.
I drafted in support/testing for handling velocities as well, plus support/testing for reading positions/velocities from one more tpx version (133). I'm trying to skip the CI when I push in for the next little bit (looks like Azure still ran though..).
Need to poke around the .tpr test files to see if there are some more interesting ones as well re: non-zero velocities, etc.
I was able to extend support and testing for .tpr position/velocity reads back as far as GMX 2023 (tpx version 129); GMX 2022rc1 (tpx 127) will fail testing because it needs more binary file striding version checks that I'll need to study in the C++ code (not too hard, but time consuming).
I also modernized an old TPR file we had with non-zero velocities (and a large number of atoms) for cobrotoxin via gmx convert-tpr and then added a test case for this. It passes for positions, but not for velocities yet, so that was a good intuition to check non-zero velocity/larger system case. Since that was converted to tpx 134 it will be a little harder to sort out the problem there, likely requiring side-by-side printf-vs-print for GMX vs. us on the binary striding.
I also modernized an old TPR file we had with non-zero velocities (and a large number of atoms) for cobrotoxin via gmx convert-tpr and then added a test case for this
This test case for non-zero velocities should be passing now, as should other tests, apart from the linter, after some more bug fixes.
The outstanding items here would likely be:
- going back to support a wider range of
tpxversions - probably a few more
tprfiles with non-zero velocities across generations should be tested
Updates from work today:
- Added support for reading positions and velocities from
tpxversion 137 (GROMACS2025.0), along with a regression test confirming position and velocity reads from a.tprfile from that version of GMX (expected values fromgmx dumpas usual) - finally fixed up the
blacklinter complaints
Added support for tpx version 127 today (GROMACS 2022)--two regression tests with different numbers of atoms were used in this case because I had to do manual binary offset analysis vs. newer versions of the format to get the positions properly extracted.
Next up from our support table should probably be tpx 122 (GROMACS 2021), maybe tomorrow.
For today, I added support for tpx version 122 (GROMACS 2021) + two regression tests. Luckily, this was easier than the last two versions I added, no byte skipping shims needed.
For tomorrow, I'll see if I can add support for tpx version 119 (GROMACS 2020).
Today, added support for tpx version 119 (GROMACS 2020), which required minor byte striding adjustments to pass the usual two regression tests.
For tomorrow, I'll aim to extend backward to tpx 116 (GROMACS 2019).
Today, support was added for tpx 116 (GMX 2019) and tpx 112 (GMX 2018). The latter required some non-trivial byte-stride analysis on the binary files.
For tomorrow, I'll take aim at tpx 110 (GMX 2016) if I can.
Today, added support for tpx 110 (GROMACS 2016) and tpx 103 (GROMACS 5.1).
I'll try to check tpx 100 (GMX 5.x before 5.1) tomorrow.
Today, added support for tpx version 100 (GMX 5.x before 5.1). No byte striding shims were needed.
Next up is tpx version 83 (GMX 4.6.x)--I did a quick check and that will require more analysis/byte striding.
Ok, this weekend I added position and velocity reading support for all remaining tpx versions that MDAnalysis can read topology from under normal circumstances.
I think this is ready for review now--I've updated the original comment above with some notes for reviewers. The linter will currently fail because I'm using a global construct to track TPR precision.
Suggestions for fixing that and anything else you find are appreciated.
@orbeckst I took at stab at addressing your and my review comments, after having to rebase to deal with gh-5061. Pasting from the commit message below the fold:
Summary of latest revisions to PR 4873
DOC, TST, BUG: PR 4873 revisions
* Add TPR position and velocity reading support to the
coordinate reading support table in our docs, as requested
by reviewer.
* Add a regression test for the proper raising of an error
when we use a proper TPR topology file, but an improper format
TPR coordinate file. This also revealed a bug in `TPR.py`
where the `logger` was not initialized, which is now fixed
and guarded by the same regression test. Note that this
new regression test does correctly distinguish the error
message for an improper coordinate TPR, which is distinct
from an improper topology TPR (I adjusted the error message
to mention coordinate parsing specifically).
* It may be worth noting that before dealing with the above
point, I hadn't thought much about the `u = mda.Universe(tpr_file, tpr_file)`
incantation vs. extracting topology and coordinates from the single
argument. That said, the separated-argument incantation does appear to
work just fine as well, so we could consider i.e., parametrizing our TPR
coordinate/velocity testing infrastructure to probe both incantations
if desirable.
* The use of a `global` variable for storing tpx precision data
has been removed in favor of adjuting `do_mtop()` to accept
this information as a new argument. This addresses one of my
own review comments (and a linter complaint about globals).
One remaining action item that your comments got me thinking about is the incantations like u = mda.Universe(tpr_file, tpr_file). This does seem to work just fine, as the detailed review comment below the fold indicates, but there is perhaps a question of wanting to test that incantation as well. I could parametrize the current test to probe both incantations I suppose.
Anything else that we are worried about? Possibly assessing performance impact, though I believe we may be in agreement that going for the new functionality first and then dealing with performance issues later may be "ok."
Super-quick comment: Looks as if you addressed all concerns. I'd merge correct code even if there's still optimization to be done. However, is there any performance regression for the cases that TPRs had been used for in the past, i.e., do your changes incur an overhead even if x/v are not needed as in Universe(TPR, XTC)?
Checking that u = mda.Universe(tpr_file, tpr_file) and u = mda.Universe(tpr_file) both work would be good.
Another thing I didn't test, but that actually works, is different versions of TPR for topology vs. x/v reading: u = mda.Universe(TPR2020, TPR2024_4). That's perhaps slightly mind bending, but I suppose there is no reason it shouldn't work. I'm not sure how far I'd want to go in officially supporting/testing that at this time, but just noticed it while addressing comments locally. Maybe I'll just add a test for a single case like that for now.
Yes, add a one off test, maybe it breaks in interesting ways in the future and reveals a real issue. Might eventually be useful when we optimize towards reading only once instead of separately for topology and x/v.
I pushed in a bunch more revisions with a detailed commit message. Probably some kind of linter/docstring issue will crop up now.
Let me see what I can do about checking the performance implications for prior usage a bit.
I'm having trouble running the asv benchmarks locally on both Mac and Linux. The sample benchmarks I drafted are below the fold (looks like this file may have annoyances with carriage returns as well?). My sample incantation was: asv continuous -E virtualenv -v -e -b "TPRBench" develop treddy_issue_464
--- a/benchmarks/benchmarks/topology.py
+++ b/benchmarks/benchmarks/topology.py
@@ -4,7 +4,8 @@ from MDAnalysis.guesser import DefaultGuesser
try:
from MDAnalysis.exceptions import NoDataError
- from MDAnalysisTests.datafiles import GRO
+ from MDAnalysisTests.datafiles import (GRO, TPR_xvf_2024_4,^M
+ TPR2023, TPR406)^M
except:
pass
@@ -45,3 +46,26 @@ class BondsBench(object):
def time_bonds(self, num_bonds):
"""Benchmark for calculating bonds"""
self.u.bonds
+^M
+^M
+class TPRBench:^M
+ """^M
+ Benchmarks for reading in TPR files, for which we^M
+ support both topology and coordinate/velocity parsing.^M
+ """^M
+ # benchmark over a variety of TPR versions and^M
+ # system sizes^M
+ params = (TPR_xvf_2024_4, # 19385 atoms^M
+ TPR2023, # 2263 atoms^M
+ TPR406, # 2263 atoms^M
+ )^M
+ param_names = ["tpr_file"]^M
+^M
+^M
+ def time_single_incantation_tpr(self, tpr_file):^M
+ MDAnalysis.Universe(tpr_file)^M
+^M
+^M
+ def time_double_incantation_tpr(self, tpr_file):^M
+ # only possible from MDA 2.10.0 onward^M
+ MDAnalysis.Universe(tpr_file, tpr_file)^M
I see Failed to build the project and import the benchmark suite. shortly after Successfully built MDAnalysis in a verbose log... don't have time to dig further on that at the moment. If other folks regularly run the benchmarks in a venv setup I'm open to suggestions... it looks like asv.conf.json may also need some modernization (some projects need to customize specific build/install cmds in there too).
Just after reviewing I looked at the docs and realized that the new TPR coordinate docs were not linked. I quickly added the files... hopefully all looks good.
Congratulations @tylerjereddy for closing a decade-old issue!
(Btw, I carefully preserved and edited your log messages in the squash-commit as they contain a plethora of useful information.)