mdanalysis
mdanalysis copied to clipboard
Feature (un)wrap enhancement
Changes made in this Pull Request:
- Optimizes unwrapping by pre-screening AG fragments that span more than half the box vectors;
- Extends the universe-validated cache mechanism to avoid re-calculating the compound splits involved in fragment unwrapping.
- Corner case of massless, single-particle compounds -- virtual-sites, or TIP4P dummy particles -- now handled gracefully by unwrap (addresses #3168). Does not ensure that those virtual sites are unwrapped with the rest of the molecule, though.
Performance gains are already of ~10x for large systems. For the AdK test system (from the .gro file) I got the following:
=========== ============= ============= =============
this PR this PR
num_atoms develop (no caching) (caching)
----------- ------------- ------------- -------------
10 561±0.9μs 431±4μs 389±1μs
100 1.13±0ms 640±3μs 596±4μs
1000 17.2±0.04ms 17.3±0.02ms 17.3±0.05ms
10000 558±3ms 170±0.6ms 171±0.8ms
All (47k) 2.17±0.01s 239±0.4ms 239±0.6ms
=========== ============= ============= =============
(To be comparable with develop, this test ignores the TIP4P massless MW particles)
Note that this is already including fragment caching (in develop). The caching that is compared here is just the cache of the operation of index splitting per fragment. That caching seems to only help marginally, but I'm not sure we should just leave it out because I imagine some systems may be heavier in this aspect.
PR Checklist
- [x] Tests?
- [x] Docs?
- [x] CHANGELOG updated?
- [x] Issue raised/referenced?
Hello @mnmelo! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
- In the file
testsuite/MDAnalysisTests/core/test_groups.py:
Line 1566:80: E501 line too long (80 > 79 characters) Line 1571:1: E302 expected 2 blank lines, found 1
Comment last updated at 2022-06-08 15:16:52 UTC
Codecov Report
Base: 93.50% // Head: 93.51% // Increases project coverage by +0.01% :tada:
Coverage data is based on head (
7a5a0df) compared to base (130e862). Patch coverage: 100.00% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## develop #3169 +/- ##
===========================================
+ Coverage 93.50% 93.51% +0.01%
===========================================
Files 190 190
Lines 24943 24989 +46
Branches 3523 3535 +12
===========================================
+ Hits 23322 23368 +46
Misses 1100 1100
Partials 521 521
| Impacted Files | Coverage Δ | |
|---|---|---|
| package/MDAnalysis/core/groups.py | 97.64% <100.00%> (+0.12%) |
:arrow_up: |
| package/MDAnalysis/core/topologyattrs.py | 95.81% <100.00%> (+0.01%) |
:arrow_up: |
| package/MDAnalysis/core/universe.py | 97.26% <100.00%> (ø) |
|
| package/MDAnalysis/lib/_cutil.pyx | 100.00% <100.00%> (ø) |
|
| package/MDAnalysis/lib/util.py | 89.50% <100.00%> (-0.15%) |
:arrow_down: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Sadly, this speedup is still well behind gmx trjconv speeds :(
For a synthetic test with an adk.xtc consisting of 1000 frames of the same structure (adk_oplsaa.gro) I get the following timings:
echo 0 | gmx trjconv -f adk.xtc -s adk_oplsaa.tpr -pbc mol -o pbc.xtc
real 0m6.313s
user 0m4.830s
sys 0m0.114s
...
for _frame in u.trajectory:
u.atoms.unwrap()
wrt.write(u.atoms)
...
real 4m8.090s
user 5m25.783s
sys 1m25.677s
Even with the ~10x speedup of this PR we're still some ~50x slower than gmx trjconv. Profiling the same 1000-frame unwrapping with MDA gives this:
53456901 function calls (53383208 primitive calls) in 252.512 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1000 138.932 0.139 147.632 0.148 {MDAnalysis.lib._cutil.make_whole}
115538 36.884 0.000 36.884 0.000 {built-in method numpy.array}
2001 28.912 0.014 28.912 0.014 {built-in method builtins.sorted}
2001 18.361 0.009 22.284 0.011 topologyattrs.py:2310(<listcomp>)
2001 7.138 0.004 97.336 0.049 topologyattrs.py:2307(get_atoms)
25002 6.028 0.000 6.028 0.000 {method 'reduce' of 'numpy.ufunc' objects}
51071684/51024001 3.889 0.000 4.617 0.000 util.py:1542(wrapper)
1001 1.802 0.002 1.805 0.002 {method 'read' of 'MDAnalysis.lib.formats.libmdaxdr.XTCFile' objects}
1000 1.682 0.002 248.248 0.248 groups.py:1683(unwrap)
1000 1.295 0.001 1.299 0.001 {method 'write' of 'MDAnalysis.lib.formats.libmdaxdr.XTCFile' objects}
6001 0.725 0.000 98.494 0.016 topologyattrs.py:425(__getitem__)
...
It seems the bulk of the work is now down to lib._cutil.make_whole (ping @zemanj, who has a branch on this optimization from the Cython side). There are other possibly excessive calls to np.array and to sort. Because I bundled reading and writing I can't be sure what those refer to.
Made a final optimization, and I think this is ready to merge.
An initial check for bonds was being made that was costing about 40% of the entire unwrap call (see above the call to topologyattrs.py:425(__getitem__)). This is unneeded, because _split_by_compound_indices and make_whole do that check on their own, and also because no really heavy work is done before those later checks.
Updated timings for MDA (now only ~27x slower than gmx trjconv!):
real 2m42.869s
user 4m1.193s
sys 1m24.911s
With this last optimization, 88.5% of the unwrap call is now spent running the cythonized make_whole function, which seems to better reflect Python's role as a wrapper in this case.
I crawled a bit deeper into the rabbit whole, and was a bit surprised to realize that inside make_whole, 83% of that time is spent on this line, doing set differences to determine which atoms have been unwrapped, and which are still to-unwrap. This strikes me as excessive, and probably there's easy gains to make there, perhaps bringing performance at least into the same order of magnitude as gmx trjconv. Stuff for a future PR.
... and the results are in: @richardjgowers' tweak to make_whole brings the time of that synthetic trajectory test way down! We're now only 5x slower than gmx trjconv!
real 0m30.821s
user 0m55.682s
sys 0m29.721s
Comparing with develop (using an adaptation of the included asv benchmark that also does full universe unwrapping) this PR can be already over 100x (!!) faster:
=========== ============= =============
num_atoms develop this PR
----------- ------------- -------------
10 561±0.9μs 249±1μs
100 1.13±0ms 265±1μs
1000 17.2±0.04ms 3.66±0.01ms
10000 558±3ms 14.7±0.2ms
All (47k) 2.17±0.01s 21.3±0.2ms
=========== ============= =============
Line profiling of make_whole shows that load is much better distributed over vector tasks. Still, 24.4% of the time goes to the getting of bonds (line 247: bonds = atomgroup.bonds.to_indices()), which is likely tied to the slow bond accession I had already seen above. 22.5% of the time goes to the subsequent selection of indices to work on. Here's the current line timings for make_whole:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
194 # map of global indices to local indices
195 1000 10156.0 10.2 0.0 ix_view = atomgroup.ix[:]
196 1000 1856.0 1.9 0.0 natoms = atomgroup.ix.shape[0]
197
198 1000 41394.0 41.4 0.1 oldpos = atomgroup.positions
199
200 # Nothing to do for less than 2 atoms
201 1000 442.0 0.4 0.0 if natoms < 2:
202 return np.array(oldpos)
203
204 1000 307.0 0.3 0.0 for i in range(natoms):
205 3341000 1389558.0 0.4 3.5 ix_to_rel[ix_view[i]] = i
206
207 1000 403.0 0.4 0.0 if reference_atom is None:
208 1000 287.0 0.3 0.0 ref = 0
209 else:
210 # Sanity check
211 if not reference_atom in atomgroup:
212 raise ValueError("Reference atom not in atomgroup")
213 ref = ix_to_rel[reference_atom.ix]
214
215 1000 6565.0 6.6 0.0 box = atomgroup.dimensions
216 1000 491.0 0.5 0.0 for i in range(3):
217 3000 950.0 0.3 0.0 half_box[i] = 0.5 * box[i]
218 3000 980.0 0.3 0.0 if box[i] == 0.0:
219 raise ValueError("One or more dimensions was zero. "
220 "You can set dimensions using 'atomgroup.dimensions='")
221
222 1000 330.0 0.3 0.0 ortho = True
223 1000 368.0 0.4 0.0 for i in range(3, 6):
224 3000 973.0 0.3 0.0 if box[i] != 90.0:
225 2000 665.0 0.3 0.0 ortho = False
226
227 1000 412.0 0.4 0.0 if ortho:
228 # If atomgroup is already unwrapped, bail out
229 is_unwrapped = True
230 for i in range(1, natoms):
231 for j in range(3):
232 if fabs(oldpos[i, j] - oldpos[0, j]) >= half_box[j]:
233 is_unwrapped = False
234 break
235 if not is_unwrapped:
236 break
237 if is_unwrapped:
238 return np.array(oldpos)
239 for i in range(3):
240 inverse_box[i] = 1.0 / box[i]
241 else:
242 1000 15963.0 16.0 0.0 from .mdamath import triclinic_vectors
243 1000 60991.0 61.0 0.2 tri_box = triclinic_vectors(box)
244
245 # C++ dict of bonds
246 1000 628.0 0.6 0.0 try:
247 1000 9645472.0 9645.5 24.4 bonds = atomgroup.bonds.to_indices()
248 except (AttributeError, NoDataError):
249 raise NoDataError("The atomgroup is required to have bonds")
250 1000 1317.0 1.3 0.0 for i in range(bonds.shape[0]):
251 3365000 933715.0 0.3 2.4 atom = bonds[i, 0]
252 3365000 915652.0 0.3 2.3 other = bonds[i, 1]
253 # only add bonds if both atoms are in atoms set
254 3365000 1367004.0 0.4 3.5 if ix_to_rel.count(atom) and ix_to_rel.count(other):
255 3365000 1047097.0 0.3 2.6 atom = ix_to_rel[atom]
256 3365000 1088477.0 0.3 2.8 other = ix_to_rel[other]
257
258 3365000 1704317.0 0.5 4.3 bonding[atom].insert(other)
259 3365000 1807016.0 0.5 4.6 bonding[other].insert(atom)
260
261 1000 12398.0 12.4 0.0 newpos = np.zeros((oldpos.shape[0], 3), dtype=np.float32)
262
263 1000 820.0 0.8 0.0 todo = intset()
264 1000 373.0 0.4 0.0 done = intset() # Who have I already searched around?
265 # initially we have one starting atom whose position is in correct image
266 1000 616.0 0.6 0.0 todo.insert(ref)
267 1000 320.0 0.3 0.0 for i in range(3):
268 3000 902.0 0.3 0.0 newpos[ref, i] = oldpos[ref, i]
269
270 1000 379.0 0.4 0.0 while not todo.empty():
271 3341000 911768.0 0.3 2.3 atom = deref(todo.begin())
272 3341000 997489.0 0.3 2.5 todo.erase(todo.begin())
273
274 6681000 2478349.0 0.4 6.3 for other in bonding[atom]:
275 # If other is already a refpoint, leave alone
276 6730000 2288131.0 0.3 5.8 if done.count(other) or todo.count(other):
277 3390000 986825.0 0.3 2.5 continue
278 # Draw vector from atom to other
279 3340000 909444.0 0.3 2.3 for i in range(3):
280 10020000 2754291.0 0.3 7.0 vec[i] = oldpos[other, i] - newpos[atom, i]
281 # Apply periodic boundary conditions to this vector
282 3340000 907822.0 0.3 2.3 if ortho:
283 minimum_image(&vec[0], &box[0], &inverse_box[0])
284 else:
285 3340000 1121333.0 0.3 2.8 minimum_image_triclinic(&vec[0], &tri_box[0, 0])
286 # Then define position of other based on this vector
287 3340000 908447.0 0.3 2.3 for i in range(3):
288 10020000 2738295.0 0.3 6.9 newpos[other, i] = newpos[atom, i] + vec[i]
289
290 # This other atom can now be used as a reference point
291 3340000 1185546.0 0.4 3.0 todo.insert(other)
292 3341000 1293332.0 0.4 3.3 done.insert(atom)
293
294 1000 304.0 0.3 0.0 if <np.intp_t> done.size() != natoms:
295 raise ValueError("AtomGroup was not contiguous from bonds, process failed")
296 1000 410.0 0.4 0.0 if inplace:
297 atomgroup.positions = newpos
298 1000 29068.0 29.1 0.1 return np.array(newpos)
This PR would make OTF transformations not only useable but competitive...
Obviously, the PR desperately needs someone to merge with current develop and the, most importantly, a few eyes that know the part of the code.
I love it if this nice feature gets into the code. I can try to rebase it and review the changes and test it, even though I am not an expert on this part of the code.
That would be fantastic, @PicoCentauri !
Hello and as usual apologies for real life keeping me away from the cool stuff I'd like to make happen. If you're up to the task, @PicoCentauri, I'll try to keep an eye out for github pings and assist where possible.
No problem, an eye would be great. I did some tests by my own and it seems that everything is working great. If there are no major objections by the other @MDAnalysis/coredevs we can soon merge this and look and more speedups after this PR.
I’m going to want to review this. Performance is great, caches and complexity aren’t :)
On Sat, Jun 11, 2022 at 06:17, Philip Loche @.***> wrote:
No problem, an eye would be great. I did some tests by my own and it seems that everything is working great. If there are no major objections by the other @MDAnalysis/coredevs https://github.com/orgs/MDAnalysis/teams/coredevs we can soon merge this and look and more speedups after this PR.
— Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/mdanalysis/pull/3169#issuecomment-1152859066, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGBYFXYWMQXN4QYPVS5LVOQOPDANCNFSM4ZGX7EHQ . You are receiving this because you were mentioned.Message ID: @.***>
@richardjgowers I agree with leaving the index-splitting caching out. Too complex, little gain. The fragment cache stays, right? There was quite some gain there.
@mnmelo would be able to remove the index caching? For you it is probably quick while for me it would took a while to remove the correct lines.
@richardjgowers, @PicoCentauri: I finally had some time to go over this during the holidays. Other than having to get into the spirit of this code again, excising the cache was simpler than I thought. A question: I'm getting some darker_lint nagging on stuff like double quotes instead of single, and some really questionable suggestions on how to break lines. This is new since I last submitted PRs. How much of those suggestions should I follow? (Or where can I read up on style guide updates? The Wiki page hasn't been updated since 2020).
I am also trying to prove better the correctness of the ptp -> RtoS sequence @richardjgowers pointed out in a review comment. I'm trying to work out some proof either way. Of course the time-saving idea here is not to have to do RtoS on all the atoms, but only on the diagonals of bounding boxes of compounds. I'll continue the discussion under that review comment.
I let @IAlibay speak to the darker_linting stuff — I think we take it mostly as suggestions as it is a stop-gap measure in the absence of PEP8speaks. At least that's my recollection.
I let @IAlibay speak to the darker_linting stuff — I think we take it mostly as suggestions as it is a stop-gap measure in the absence of PEP8speaks. At least that's my recollection.
Pretty much, all PEP8 things are really advisory anyways.
The only thing I would specifically look out for are flake8 messages, since those specifically highlight PEP8 violations rather than black formatting preferences.
Here we have one such violation, a indentation issue for core/groups.py:1964:47
See: https://github.com/MDAnalysis/mdanalysis/actions/runs/3827118314/jobs/6511441089#step:4:448