mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Enabling of parallelization of `analysis.atomicdistances.AtomicDistances`

Open talagayev opened this issue 11 months ago • 10 comments
trafficstars

Changes made in this Pull Request:

  • added self.results and self.results.distances in analysis.atomicdistances.AtomicDistances for the use of Results

PR Checklist

  • [x] Tests?
  • [x] Docs?
  • [x] CHANGELOG updated?
  • [x] Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4822.org.readthedocs.build/en/4822/

talagayev avatar Dec 05 '24 00:12 talagayev

Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2024-12-12 23:12:15 UTC

pep8speaks avatar Dec 05 '24 00:12 pep8speaks

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 93.66%. Comparing base (a27591c) to head (e7da740). Report is 49 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4822      +/-   ##
===========================================
- Coverage    93.68%   93.66%   -0.03%     
===========================================
  Files          177      189      +12     
  Lines        21743    22813    +1070     
  Branches      3055     3055              
===========================================
+ Hits         20370    21367     +997     
- Misses         927     1000      +73     
  Partials       446      446              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Dec 05 '24 01:12 codecov[bot]

I was confused because the PR supposedly fixes #4819 but it does not.

Instead it provides a workaround for the problem identified in PR #4808 that we cannot currently parallelize AtomicDistances.

I would be ok with this hack to make parallelization work for 2.9.0 and then make the switch (fix for #4819) before 3.0. I'd like this PR to be clearer about what it does.

I'd also like to hear some other opinions, e.g., @tylerjereddy @marinegor @IAlibay .

Yes my bad, I thought that it would fix #4819 but I think I missunderstood it, since I thought having it work with parallelization would fix the Issue, since it would have something that the ResultsGroup could use for the parallelization and thus use Results 🙈

talagayev avatar Dec 12 '24 23:12 talagayev

I was confused because the PR supposedly fixes #4819 but it does not. Instead it provides a workaround for the problem identified in PR #4808 that we cannot currently parallelize AtomicDistances. I would be ok with this hack to make parallelization work for 2.9.0 and then make the switch (fix for #4819) before 3.0. I'd like this PR to be clearer about what it does. I'd also like to hear some other opinions, e.g., @tylerjereddy @marinegor @IAlibay .

Yes my bad, I thought that it would fix #4819 but I think I missunderstood it, since I thought having it work with parallelization would fix the Issue, since it would have something that the ResultsGroup could use for the parallelization and thus use Results 🙈

I will then rename the PR more like enabling the parallelization of analysis.atomicdistances.AtomicDistances and also remove the mention of the fix, change the changelog to it and the docs and wait for further instructions how to procede

talagayev avatar Dec 12 '24 23:12 talagayev

I'd also like to hear some other opinions, e.g., @tylerjereddy @marinegor @IAlibay .

Yes, this is why I went silent on my review activity--I couldn't decide if I was just too out of the loop on the parallel business or if this didn't do what the targeted ticket asked for. I couldn't work out how the changes addressed the original problem so figured I may have just not understood.

My original understanding was that there was supposed to be some kind of results-related container object that may contain "slots" for specific results, and it was being incorrectly used to store a specific result itself.

If I understand this correctly, is it really that much more work to do it the "right" way? Or just a compatibility concern?

tylerjereddy avatar Dec 13 '24 01:12 tylerjereddy

It would be easy to do it right, just as you say — just not add the line that you commented on (self.results = self.results.distances). The problem is that the fix (for #4819) would break the previous behavior (so it's a "compatibility concern"). There were some concerns by @IAlibay https://github.com/MDAnalysis/mdanalysis/issues/4819#issuecomment-2515852750 to make the change. @talagayev is proposing a workaround that will allow (1) implementing the parallelization of the class (see discussion on #4808 why that would be good), (2) avoid breaking the user-facing code (and then properly fix in 3.0).

My opinion (as stated in https://github.com/MDAnalysis/mdanalysis/issues/4819#issuecomment-2515819696 ) is that I'd rather fix the API ("doing it right") and call the break a fix.

orbeckst avatar Dec 13 '24 07:12 orbeckst

I second Oliver on this -- I'd rather fix the non-compliant class and add the new functionality later, than add some temporary hacks.

If it's urgent to add the distances parallelization right now (which I doubt), it's always possible to add it as an mdakit which you'll deprecate when the upstream fixes the issue.

marinegor avatar Dec 14 '24 21:12 marinegor

Just wanted to say that I see the ping here but I haven't read anything and won't until the latter half of next week - apologies it's a busy end of year period.

IAlibay avatar Dec 14 '24 22:12 IAlibay

While giving Irfan more time, let me briefly reply to @marinegor https://github.com/MDAnalysis/mdanalysis/pull/4822#issuecomment-2543349469 :

I second Oliver on this -- I'd rather fix the non-compliant class and add the new functionality later, than add some temporary hacks.

If we decide to fix the non-compliant class (#4819 ) then the parallelization is already free — this PR already does it.

With the hack in this PR we could also have the parallelization and keep the old behavior of the class and then remove the one line that preserves the old API when we're ready for 3.0.

If it's urgent to add the distances parallelization right now (which I doubt), it's always possible to add it as an mdakit which you'll deprecate when the upstream fixes the issue.

It's not urgent but it's also hard to communicate to the outside world why something as simple as distance calculations cannot be immediately parallelized — it just looks a bit amateurish.

As a sidenote: MDAKits are not "free", they still require maintenance by someone and especially importing MDAKits in MDA as a transitional measure is particularly annoying because of package dependency issues. We are trying to get away from that pattern as quickly as possible.

orbeckst avatar Dec 17 '24 20:12 orbeckst

@orbeckst

It's not urgent but it's also hard to communicate to the outside world why something as simple as distance calculations cannot be immediately parallelized

oh, that makes sense. Though I guess the explanation about API consistency could do most of the explaining.

Also, a side note: it could be relatively low priority to parallelize specifically distances, since I guess it's relatively fast already, and reading from the disk would be a bottleneck for it, rather than the computation itself.

marinegor avatar Dec 18 '24 19:12 marinegor