mdanalysis
mdanalysis copied to clipboard
Implements compact wrapping and nojump, and updates associated transforms.
Implemented function distances.apply_compact_PBC. With it I could easily adapt AtomGroup.wrap() to also do compact wrapping, depending on a keyword. Backward compatibility is maintained.
Also optimized the way compound_indices are handled in AtomGroup.wrap(), with major speedup:
For an AtomGroup of 146k atoms:
Before
%timeit ag.wrap(compound='fragments', inplace=False)
1.08 s ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
After
%timeit ag.wrap(compound='fragments', inplace=False)
72.7 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In transformations:
-
Made
transformations.wrapandtransformations.unwraptransparently expose AtomGroup.wrap() and AtomGroup.unwrap(), to take advantage of the new functionality. -
Added
transformations.nojump(should this be a separate PR?)
Also added the center keyword to distances.apply_PBC (with care to minimally impact performance), which allows the PBC wrapping to an arbitrarily centered box. This helps with vector treatment in transformations.nojump, and can be useful in other cases (say, to wrap atoms to a box centered on the origin or on a dynamic point). Should AtomGroup.wrap() expose this capability too?
Specific tests still to be written, but would like feedback in the meantime. Existing tests pass, except for the seemingly unrelated testsuite/MDAnalysisTests/analysis/test_pca.py:test_given_mean and testsuite/MDAnalysisTests/analysis/test_encore.py:TestEncoreClustering.test_clustering_three_ensembles_two_identical (which also fail for the unmodified develop branch).
Feedback much appreciated!
PR Checklist
- [ ] Tests?
- [x] Docs?
- [x] CHANGELOG updated?
- [ ] Issue raised/referenced?
(There's no issue for these enhancements. Should I open one?)
Hello @mnmelo! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
- In the file
package/MDAnalysis/core/groups.py:
Line 1488:80: E501 line too long (80 > 79 characters)
- In the file
package/MDAnalysis/transformations/wrap.py:
Line 211:80: E501 line too long (80 > 79 characters)
Comment last updated at 2020-10-14 13:44:37 UTC
Codecov Report
Merging #2982 into develop will decrease coverage by
0.24%. The diff coverage is31.18%.
@@ Coverage Diff @@
## develop #2982 +/- ##
===========================================
- Coverage 93.05% 92.80% -0.25%
===========================================
Files 186 186
Lines 24609 24679 +70
Branches 3187 3197 +10
===========================================
+ Hits 22900 22904 +4
- Misses 1661 1727 +66
Partials 48 48
| Impacted Files | Coverage Δ | |
|---|---|---|
| package/MDAnalysis/lib/mdamath.py | 100.00% <ø> (ø) |
|
| package/MDAnalysis/lib/distances.py | 90.77% <21.21%> (-7.64%) |
:arrow_down: |
| package/MDAnalysis/transformations/wrap.py | 33.89% <24.00%> (-66.11%) |
:arrow_down: |
| package/MDAnalysis/core/groups.py | 98.57% <100.00%> (-0.01%) |
:arrow_down: |
| package/MDAnalysis/transformations/__init__.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update a9543ba...85d566b. Read the comment docs.
I didn't have the time to review, sorry. But: >10 times speedup for wrapping/unwrapping — that will finally make these transformations usable 🚀 . Also, getting compact and nojump will be super-neat.
I would break up the PR – one for the performance, one or more PRs for new features. The smaller the better, really.
If the performance one can be separate from everything else then we can easily backport it.
Perfect, I'll split it up, then. Just to be clear, I only optimized the wrapping. Was actually right now looking at the unwrapping and think we can look at some 5x speedup for a 100k atom system. Anyway, we can discuss better in the respective PRs.
Also, the atom-based wrapping is already quite fast (milliseconds for 100k atoms). What I optimized was the fragment based wrapping.
The fragment-based wrapping is the crucial one for making sane trajectories.... so any improvements there are very welcome!
Thanks for the positive outlook, @orbeckst. To make this work I propose some refactoring to the compound and cache code in #3000 and #3005, and further unwrap-specific optimizations in #2376. I really need some approval of those ideas moving forward, because they involve some work and touch some central aspects that others might want to chip in.
nojump would be really good for MSD (@hmacdope recently told someone on the mailing list that they had to use GROMACS for that part) and compact has always been one my favorites in trjconv. Having both would boost transformations.
I notice that all the prerequisite PRs have been merged so in principle we "just" need (1) merging of develop into this PR and (2) a few 👀 on the code. It would be a shame to let it rot.
@mnmelo there is now an nojump method available, and this PR is quite stale. Is it okay for me to close here?
Sure!
We have the nojump but IIRC a major contribution of this PR to speed up unwrapping ("make molecules whole across PBC") which is still veeeeeeeeeeeeeeeeeery slow.
It's become very clear to me that I don't have the bandwidth to move this PR forward (apologies) but I am also not sure that it makes sense to close it if there's code in there that we should really be working on. @hmacdope have you had a deeper look?