mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Feature (un)wrap enhancement

Open mnmelo opened this issue 4 years ago • 16 comments

Changes made in this Pull Request:

  • Optimizes unwrapping by pre-screening AG fragments that span more than half the box vectors;
  • Extends the universe-validated cache mechanism to avoid re-calculating the compound splits involved in fragment unwrapping.
  • Corner case of massless, single-particle compounds -- virtual-sites, or TIP4P dummy particles -- now handled gracefully by unwrap (addresses #3168). Does not ensure that those virtual sites are unwrapped with the rest of the molecule, though.

Performance gains are already of ~10x for large systems. For the AdK test system (from the .gro file) I got the following:

 ===========   =============   =============   =============
                                  this PR         this PR
  num_atoms       develop       (no caching)     (caching)
 -----------   -------------   -------------   -------------
      10         561±0.9μs        431±4μs         389±1μs
     100          1.13±0ms        640±3μs         596±4μs
     1000       17.2±0.04ms     17.3±0.02ms     17.3±0.05ms
    10000         558±3ms        170±0.6ms       171±0.8ms
  All (47k)      2.17±0.01s      239±0.4ms       239±0.6ms
 ===========   =============   =============   =============

(To be comparable with develop, this test ignores the TIP4P massless MW particles)

Note that this is already including fragment caching (in develop). The caching that is compared here is just the cache of the operation of index splitting per fragment. That caching seems to only help marginally, but I'm not sure we should just leave it out because I imagine some systems may be heavier in this aspect.

PR Checklist

  • [x] Tests?
  • [x] Docs?
  • [x] CHANGELOG updated?
  • [x] Issue raised/referenced?

mnmelo avatar Mar 15 '21 15:03 mnmelo

Hello @mnmelo! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 1566:80: E501 line too long (80 > 79 characters) Line 1571:1: E302 expected 2 blank lines, found 1

Comment last updated at 2022-06-08 15:16:52 UTC

pep8speaks avatar Mar 15 '21 15:03 pep8speaks

Codecov Report

Base: 93.50% // Head: 93.51% // Increases project coverage by +0.01% :tada:

Coverage data is based on head (7a5a0df) compared to base (130e862). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3169      +/-   ##
===========================================
+ Coverage    93.50%   93.51%   +0.01%     
===========================================
  Files          190      190              
  Lines        24943    24989      +46     
  Branches      3523     3535      +12     
===========================================
+ Hits         23322    23368      +46     
  Misses        1100     1100              
  Partials       521      521              
Impacted Files Coverage Δ
package/MDAnalysis/core/groups.py 97.64% <100.00%> (+0.12%) :arrow_up:
package/MDAnalysis/core/topologyattrs.py 95.81% <100.00%> (+0.01%) :arrow_up:
package/MDAnalysis/core/universe.py 97.26% <100.00%> (ø)
package/MDAnalysis/lib/_cutil.pyx 100.00% <100.00%> (ø)
package/MDAnalysis/lib/util.py 89.50% <100.00%> (-0.15%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Mar 15 '21 15:03 codecov[bot]

Sadly, this speedup is still well behind gmx trjconv speeds :( For a synthetic test with an adk.xtc consisting of 1000 frames of the same structure (adk_oplsaa.gro) I get the following timings:

echo 0 | gmx trjconv -f adk.xtc -s adk_oplsaa.tpr -pbc mol -o pbc.xtc
real	0m6.313s
user	0m4.830s
sys	0m0.114s
...
for _frame in u.trajectory:
    u.atoms.unwrap()
    wrt.write(u.atoms)
...

real	4m8.090s
user	5m25.783s
sys	1m25.677s

Even with the ~10x speedup of this PR we're still some ~50x slower than gmx trjconv. Profiling the same 1000-frame unwrapping with MDA gives this:

         53456901 function calls (53383208 primitive calls) in 252.512 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000  138.932    0.139  147.632    0.148 {MDAnalysis.lib._cutil.make_whole}
   115538   36.884    0.000   36.884    0.000 {built-in method numpy.array}
     2001   28.912    0.014   28.912    0.014 {built-in method builtins.sorted}
     2001   18.361    0.009   22.284    0.011 topologyattrs.py:2310(<listcomp>)
     2001    7.138    0.004   97.336    0.049 topologyattrs.py:2307(get_atoms)
    25002    6.028    0.000    6.028    0.000 {method 'reduce' of 'numpy.ufunc' objects}
51071684/51024001    3.889    0.000    4.617    0.000 util.py:1542(wrapper)
     1001    1.802    0.002    1.805    0.002 {method 'read' of 'MDAnalysis.lib.formats.libmdaxdr.XTCFile' objects}
     1000    1.682    0.002  248.248    0.248 groups.py:1683(unwrap)
     1000    1.295    0.001    1.299    0.001 {method 'write' of 'MDAnalysis.lib.formats.libmdaxdr.XTCFile' objects}
     6001    0.725    0.000   98.494    0.016 topologyattrs.py:425(__getitem__)
     ...

It seems the bulk of the work is now down to lib._cutil.make_whole (ping @zemanj, who has a branch on this optimization from the Cython side). There are other possibly excessive calls to np.array and to sort. Because I bundled reading and writing I can't be sure what those refer to.

mnmelo avatar Mar 15 '21 16:03 mnmelo

Made a final optimization, and I think this is ready to merge.

An initial check for bonds was being made that was costing about 40% of the entire unwrap call (see above the call to topologyattrs.py:425(__getitem__)). This is unneeded, because _split_by_compound_indices and make_whole do that check on their own, and also because no really heavy work is done before those later checks.

Updated timings for MDA (now only ~27x slower than gmx trjconv!):

real	2m42.869s
user	4m1.193s
sys	1m24.911s

With this last optimization, 88.5% of the unwrap call is now spent running the cythonized make_whole function, which seems to better reflect Python's role as a wrapper in this case.

I crawled a bit deeper into the rabbit whole, and was a bit surprised to realize that inside make_whole, 83% of that time is spent on this line, doing set differences to determine which atoms have been unwrapped, and which are still to-unwrap. This strikes me as excessive, and probably there's easy gains to make there, perhaps bringing performance at least into the same order of magnitude as gmx trjconv. Stuff for a future PR.

mnmelo avatar Apr 04 '21 05:04 mnmelo

... and the results are in: @richardjgowers' tweak to make_whole brings the time of that synthetic trajectory test way down! We're now only 5x slower than gmx trjconv!

real	0m30.821s
user	0m55.682s
sys	0m29.721s

Comparing with develop (using an adaptation of the included asv benchmark that also does full universe unwrapping) this PR can be already over 100x (!!) faster:

 ===========   =============   =============
  num_atoms       develop         this PR
 -----------   -------------   -------------
       10         561±0.9μs        249±1μs
      100          1.13±0ms        265±1μs
     1000       17.2±0.04ms     3.66±0.01ms
    10000         558±3ms        14.7±0.2ms
  All (47k)      2.17±0.01s      21.3±0.2ms
 ===========   =============   =============

Line profiling of make_whole shows that load is much better distributed over vector tasks. Still, 24.4% of the time goes to the getting of bonds (line 247: bonds = atomgroup.bonds.to_indices()), which is likely tied to the slow bond accession I had already seen above. 22.5% of the time goes to the subsequent selection of indices to work on. Here's the current line timings for make_whole:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   194                                               # map of global indices to local indices
   195      1000      10156.0     10.2      0.0      ix_view = atomgroup.ix[:]
   196      1000       1856.0      1.9      0.0      natoms = atomgroup.ix.shape[0]
   197
   198      1000      41394.0     41.4      0.1      oldpos = atomgroup.positions
   199
   200                                               # Nothing to do for less than 2 atoms
   201      1000        442.0      0.4      0.0      if natoms < 2:
   202                                                   return np.array(oldpos)
   203
   204      1000        307.0      0.3      0.0      for i in range(natoms):
   205   3341000    1389558.0      0.4      3.5          ix_to_rel[ix_view[i]] = i
   206
   207      1000        403.0      0.4      0.0      if reference_atom is None:
   208      1000        287.0      0.3      0.0          ref = 0
   209                                               else:
   210                                                   # Sanity check
   211                                                   if not reference_atom in atomgroup:
   212                                                       raise ValueError("Reference atom not in atomgroup")
   213                                                   ref = ix_to_rel[reference_atom.ix]
   214
   215      1000       6565.0      6.6      0.0      box = atomgroup.dimensions
   216      1000        491.0      0.5      0.0      for i in range(3):
   217      3000        950.0      0.3      0.0          half_box[i] = 0.5 * box[i]
   218      3000        980.0      0.3      0.0          if box[i] == 0.0:
   219                                                       raise ValueError("One or more dimensions was zero.  "
   220                                                                        "You can set dimensions using 'atomgroup.dimensions='")
   221
   222      1000        330.0      0.3      0.0      ortho = True
   223      1000        368.0      0.4      0.0      for i in range(3, 6):
   224      3000        973.0      0.3      0.0          if box[i] != 90.0:
   225      2000        665.0      0.3      0.0              ortho = False
   226
   227      1000        412.0      0.4      0.0      if ortho:
   228                                                   # If atomgroup is already unwrapped, bail out
   229                                                   is_unwrapped = True
   230                                                   for i in range(1, natoms):
   231                                                       for j in range(3):
   232                                                           if fabs(oldpos[i, j] - oldpos[0, j]) >= half_box[j]:
   233                                                               is_unwrapped = False
   234                                                               break
   235                                                       if not is_unwrapped:
   236                                                           break
   237                                                   if is_unwrapped:
   238                                                       return np.array(oldpos)
   239                                                   for i in range(3):
   240                                                       inverse_box[i] = 1.0 / box[i]
   241                                               else:
   242      1000      15963.0     16.0      0.0          from .mdamath import triclinic_vectors
   243      1000      60991.0     61.0      0.2          tri_box = triclinic_vectors(box)
   244
   245                                               # C++ dict of bonds
   246      1000        628.0      0.6      0.0      try:
   247      1000    9645472.0   9645.5     24.4          bonds = atomgroup.bonds.to_indices()
   248                                               except (AttributeError, NoDataError):
   249                                                   raise NoDataError("The atomgroup is required to have bonds")
   250      1000       1317.0      1.3      0.0      for i in range(bonds.shape[0]):
   251   3365000     933715.0      0.3      2.4          atom = bonds[i, 0]
   252   3365000     915652.0      0.3      2.3          other = bonds[i, 1]
   253                                                   # only add bonds if both atoms are in atoms set
   254   3365000    1367004.0      0.4      3.5          if ix_to_rel.count(atom) and ix_to_rel.count(other):
   255   3365000    1047097.0      0.3      2.6              atom = ix_to_rel[atom]
   256   3365000    1088477.0      0.3      2.8              other = ix_to_rel[other]
   257
   258   3365000    1704317.0      0.5      4.3              bonding[atom].insert(other)
   259   3365000    1807016.0      0.5      4.6              bonding[other].insert(atom)
   260
   261      1000      12398.0     12.4      0.0      newpos = np.zeros((oldpos.shape[0], 3), dtype=np.float32)
   262
   263      1000        820.0      0.8      0.0      todo = intset()
   264      1000        373.0      0.4      0.0      done = intset()  # Who have I already searched around?
   265                                               # initially we have one starting atom whose position is in correct image
   266      1000        616.0      0.6      0.0      todo.insert(ref)
   267      1000        320.0      0.3      0.0      for i in range(3):
   268      3000        902.0      0.3      0.0          newpos[ref, i] = oldpos[ref, i]
   269
   270      1000        379.0      0.4      0.0      while not todo.empty():
   271   3341000     911768.0      0.3      2.3          atom = deref(todo.begin())
   272   3341000     997489.0      0.3      2.5          todo.erase(todo.begin())
   273
   274   6681000    2478349.0      0.4      6.3          for other in bonding[atom]:
   275                                                       # If other is already a refpoint, leave alone
   276   6730000    2288131.0      0.3      5.8              if done.count(other) or todo.count(other):
   277   3390000     986825.0      0.3      2.5                  continue
   278                                                       # Draw vector from atom to other
   279   3340000     909444.0      0.3      2.3              for i in range(3):
   280  10020000    2754291.0      0.3      7.0                  vec[i] = oldpos[other, i] - newpos[atom, i]
   281                                                       # Apply periodic boundary conditions to this vector
   282   3340000     907822.0      0.3      2.3              if ortho:
   283                                                           minimum_image(&vec[0], &box[0], &inverse_box[0])
   284                                                       else:
   285   3340000    1121333.0      0.3      2.8                  minimum_image_triclinic(&vec[0], &tri_box[0, 0])
   286                                                       # Then define position of other based on this vector
   287   3340000     908447.0      0.3      2.3              for i in range(3):
   288  10020000    2738295.0      0.3      6.9                  newpos[other, i] = newpos[atom, i] + vec[i]
   289
   290                                                       # This other atom can now be used as a reference point
   291   3340000    1185546.0      0.4      3.0              todo.insert(other)
   292   3341000    1293332.0      0.4      3.3          done.insert(atom)
   293
   294      1000        304.0      0.3      0.0      if <np.intp_t> done.size() != natoms:
   295                                                   raise ValueError("AtomGroup was not contiguous from bonds, process failed")
   296      1000        410.0      0.4      0.0      if inplace:
   297                                                   atomgroup.positions = newpos
   298      1000      29068.0     29.1      0.1      return np.array(newpos)

mnmelo avatar Apr 04 '21 23:04 mnmelo

This PR would make OTF transformations not only useable but competitive...

Obviously, the PR desperately needs someone to merge with current develop and the, most importantly, a few eyes that know the part of the code.

orbeckst avatar Apr 14 '22 08:04 orbeckst

I love it if this nice feature gets into the code. I can try to rebase it and review the changes and test it, even though I am not an expert on this part of the code.

PicoCentauri avatar Jun 03 '22 08:06 PicoCentauri

That would be fantastic, @PicoCentauri !

orbeckst avatar Jun 03 '22 17:06 orbeckst

Hello and as usual apologies for real life keeping me away from the cool stuff I'd like to make happen. If you're up to the task, @PicoCentauri, I'll try to keep an eye out for github pings and assist where possible.

mnmelo avatar Jun 10 '22 14:06 mnmelo

No problem, an eye would be great. I did some tests by my own and it seems that everything is working great. If there are no major objections by the other @MDAnalysis/coredevs we can soon merge this and look and more speedups after this PR.

PicoCentauri avatar Jun 11 '22 05:06 PicoCentauri

I’m going to want to review this. Performance is great, caches and complexity aren’t :)

On Sat, Jun 11, 2022 at 06:17, Philip Loche @.***> wrote:

No problem, an eye would be great. I did some tests by my own and it seems that everything is working great. If there are no major objections by the other @MDAnalysis/coredevs https://github.com/orgs/MDAnalysis/teams/coredevs we can soon merge this and look and more speedups after this PR.

— Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/mdanalysis/pull/3169#issuecomment-1152859066, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGBYFXYWMQXN4QYPVS5LVOQOPDANCNFSM4ZGX7EHQ . You are receiving this because you were mentioned.Message ID: @.***>

richardjgowers avatar Jun 11 '22 07:06 richardjgowers

@richardjgowers I agree with leaving the index-splitting caching out. Too complex, little gain. The fragment cache stays, right? There was quite some gain there.

mnmelo avatar Jun 20 '22 01:06 mnmelo

@mnmelo would be able to remove the index caching? For you it is probably quick while for me it would took a while to remove the correct lines.

PicoCentauri avatar Jun 21 '22 13:06 PicoCentauri

@richardjgowers, @PicoCentauri: I finally had some time to go over this during the holidays. Other than having to get into the spirit of this code again, excising the cache was simpler than I thought. A question: I'm getting some darker_lint nagging on stuff like double quotes instead of single, and some really questionable suggestions on how to break lines. This is new since I last submitted PRs. How much of those suggestions should I follow? (Or where can I read up on style guide updates? The Wiki page hasn't been updated since 2020).

I am also trying to prove better the correctness of the ptp -> RtoS sequence @richardjgowers pointed out in a review comment. I'm trying to work out some proof either way. Of course the time-saving idea here is not to have to do RtoS on all the atoms, but only on the diagonals of bounding boxes of compounds. I'll continue the discussion under that review comment.

mnmelo avatar Jan 03 '23 11:01 mnmelo

I let @IAlibay speak to the darker_linting stuff — I think we take it mostly as suggestions as it is a stop-gap measure in the absence of PEP8speaks. At least that's my recollection.

orbeckst avatar Jan 06 '23 00:01 orbeckst

I let @IAlibay speak to the darker_linting stuff — I think we take it mostly as suggestions as it is a stop-gap measure in the absence of PEP8speaks. At least that's my recollection.

Pretty much, all PEP8 things are really advisory anyways.

The only thing I would specifically look out for are flake8 messages, since those specifically highlight PEP8 violations rather than black formatting preferences.

Here we have one such violation, a indentation issue for core/groups.py:1964:47

See: https://github.com/MDAnalysis/mdanalysis/actions/runs/3827118314/jobs/6511441089#step:4:448

IAlibay avatar Jan 06 '23 00:01 IAlibay