mdanalysis
mdanalysis copied to clipboard
Enabling of parallelization of `analysis.atomicdistances.AtomicDistances`
Changes made in this Pull Request:
- added
self.resultsandself.results.distancesinanalysis.atomicdistances.AtomicDistancesfor the use ofResults
PR Checklist
- [x] Tests?
- [x] Docs?
- [x] CHANGELOG updated?
- [x] Issue raised/referenced?
Developers certificate of origin
- [x] I certify that this contribution is covered by the LGPLv2.1+ license as defined in our LICENSE and adheres to the Developer Certificate of Origin.
📚 Documentation preview 📚: https://mdanalysis--4822.org.readthedocs.build/en/4822/
Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2024-12-12 23:12:15 UTC
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 93.66%. Comparing base (
a27591c) to head (e7da740). Report is 49 commits behind head on develop.
Additional details and impacted files
@@ Coverage Diff @@
## develop #4822 +/- ##
===========================================
- Coverage 93.68% 93.66% -0.03%
===========================================
Files 177 189 +12
Lines 21743 22813 +1070
Branches 3055 3055
===========================================
+ Hits 20370 21367 +997
- Misses 927 1000 +73
Partials 446 446
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
🚀 New features to boost your workflow:
- ❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
I was confused because the PR supposedly fixes #4819 but it does not.
Instead it provides a workaround for the problem identified in PR #4808 that we cannot currently parallelize AtomicDistances.
I would be ok with this hack to make parallelization work for 2.9.0 and then make the switch (fix for #4819) before 3.0. I'd like this PR to be clearer about what it does.
I'd also like to hear some other opinions, e.g., @tylerjereddy @marinegor @IAlibay .
Yes my bad, I thought that it would fix #4819 but I think I missunderstood it, since I thought having it work with parallelization would fix the Issue, since it would have something that the ResultsGroup could use for the parallelization and thus use Results 🙈
I was confused because the PR supposedly fixes #4819 but it does not. Instead it provides a workaround for the problem identified in PR #4808 that we cannot currently parallelize AtomicDistances. I would be ok with this hack to make parallelization work for 2.9.0 and then make the switch (fix for #4819) before 3.0. I'd like this PR to be clearer about what it does. I'd also like to hear some other opinions, e.g., @tylerjereddy @marinegor @IAlibay .
Yes my bad, I thought that it would fix #4819 but I think I missunderstood it, since I thought having it work with parallelization would fix the Issue, since it would have something that the
ResultsGroupcould use for the parallelization and thus useResults🙈
I will then rename the PR more like enabling the parallelization of analysis.atomicdistances.AtomicDistances and also remove the mention of the fix, change the changelog to it and the docs and wait for further instructions how to procede
I'd also like to hear some other opinions, e.g., @tylerjereddy @marinegor @IAlibay .
Yes, this is why I went silent on my review activity--I couldn't decide if I was just too out of the loop on the parallel business or if this didn't do what the targeted ticket asked for. I couldn't work out how the changes addressed the original problem so figured I may have just not understood.
My original understanding was that there was supposed to be some kind of results-related container object that may contain "slots" for specific results, and it was being incorrectly used to store a specific result itself.
If I understand this correctly, is it really that much more work to do it the "right" way? Or just a compatibility concern?
It would be easy to do it right, just as you say — just not add the line that you commented on (self.results = self.results.distances). The problem is that the fix (for #4819) would break the previous behavior (so it's a "compatibility concern"). There were some concerns by @IAlibay https://github.com/MDAnalysis/mdanalysis/issues/4819#issuecomment-2515852750 to make the change. @talagayev is proposing a workaround that will allow (1) implementing the parallelization of the class (see discussion on #4808 why that would be good), (2) avoid breaking the user-facing code (and then properly fix in 3.0).
My opinion (as stated in https://github.com/MDAnalysis/mdanalysis/issues/4819#issuecomment-2515819696 ) is that I'd rather fix the API ("doing it right") and call the break a fix.
I second Oliver on this -- I'd rather fix the non-compliant class and add the new functionality later, than add some temporary hacks.
If it's urgent to add the distances parallelization right now (which I doubt), it's always possible to add it as an mdakit which you'll deprecate when the upstream fixes the issue.
Just wanted to say that I see the ping here but I haven't read anything and won't until the latter half of next week - apologies it's a busy end of year period.
While giving Irfan more time, let me briefly reply to @marinegor https://github.com/MDAnalysis/mdanalysis/pull/4822#issuecomment-2543349469 :
I second Oliver on this -- I'd rather fix the non-compliant class and add the new functionality later, than add some temporary hacks.
If we decide to fix the non-compliant class (#4819 ) then the parallelization is already free — this PR already does it.
With the hack in this PR we could also have the parallelization and keep the old behavior of the class and then remove the one line that preserves the old API when we're ready for 3.0.
If it's urgent to add the distances parallelization right now (which I doubt), it's always possible to add it as an mdakit which you'll deprecate when the upstream fixes the issue.
It's not urgent but it's also hard to communicate to the outside world why something as simple as distance calculations cannot be immediately parallelized — it just looks a bit amateurish.
As a sidenote: MDAKits are not "free", they still require maintenance by someone and especially importing MDAKits in MDA as a transitional measure is particularly annoying because of package dependency issues. We are trying to get away from that pattern as quickly as possible.
@orbeckst
It's not urgent but it's also hard to communicate to the outside world why something as simple as distance calculations cannot be immediately parallelized
oh, that makes sense. Though I guess the explanation about API consistency could do most of the explaining.
Also, a side note: it could be relatively low priority to parallelize specifically distances, since I guess it's relatively fast already, and reading from the disk would be a bottleneck for it, rather than the computation itself.