mdanalysis
mdanalysis copied to clipboard
Parallelizes `MDAnalysis.analysis.msd`
Fixes #4676
Changes made in this Pull Request:
- Added the
split-apply-combinetechnique to parallelize theMDAnalysis.analysis.msd.EinsteinMSD - Added boilerplate fixture(s) to
testsuite/analysis/conftest.py, analogous with existing ones - Added a
client_EinsteinMSD, fixtures to all tests using intestsuite/MDAnalysisTests/analysis/test_msd.py, and modified the wayrun()method is called torun(**client_EinsteinMSD)
PR Checklist
- [x] Tests?
- [x] Docs?
- [x] CHANGELOG updated?
- [x] Issue raised/referenced?
Developers certificate of origin
- [x] I certify that this contribution is covered by the LGPLv2.1+ license as defined in our LICENSE and adheres to the Developer Certificate of Origin.
📚 Documentation preview 📚: https://mdanalysis--4896.org.readthedocs.build/en/4896/
Hello @tanishy7777! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2025-01-20 21:03:12 UTC
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 93.41%. Comparing base (
7fb3534) to head (ba68a93).
Additional details and impacted files
@@ Coverage Diff @@
## develop #4896 +/- ##
===========================================
- Coverage 93.42% 93.41% -0.01%
===========================================
Files 177 189 +12
Lines 21865 22945 +1080
Branches 3079 3079
===========================================
+ Hits 20427 21435 +1008
- Misses 986 1059 +73
+ Partials 452 451 -1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Just wanted to remind you that this is ready to be merged I think. Please do so at your convenience. @RMeli @orbeckst
Thanks for your work. I'm currently quite busy, so might not be able to review in the next few days. Please be patient.
@talagayev / @marinegor can you have a look at this PR, please?
Checked the code and ran locally, looks all good.
https://github.com/tanishy7777/mdanalysis/blob/18a2e516d914f6dc438b409b403be2a1a3429e77/testsuite/MDAnalysisTests/analysis/test_msd.py#L155
Here @tanishy7777 you could also add the **client_EinsteinMSD to cover the parallelization in test_simple_start_stop_step_all_dims and test_fft_start_stop_step_all_dims, but here I would rely on what @orbeckst suggests if it needs to have the **client_EinsteinMSD or not.
From my side it looks good, good job @tanishy7777 :)
Checked the code and ran locally, looks all good.
https://github.com/tanishy7777/mdanalysis/blob/18a2e516d914f6dc438b409b403be2a1a3429e77/testsuite/MDAnalysisTests/analysis/test_msd.py#L155
Here @tanishy7777 you could also add the
**client_EinsteinMSDto cover the parallelization intest_simple_start_stop_step_all_dimsandtest_fft_start_stop_step_all_dims, but here I would rely on what @orbeckst suggests if it needs to have the**client_EinsteinMSDor not.From my side it looks good, good job @tanishy7777 :)
Thanks a lot for reviewing my PR, will wait for the suggestions as you mentioned.
From my side it looks good, good job @tanishy7777 :)
Also could you please review this PR #4884 its pretty similar or tell me if it needs any more work to be done. Thanks again
From my side it looks good, good job @tanishy7777 :)
Also could you please review this PR #4884 its pretty similar or tell me if it needs any more work to be done. Thanks again
Hey @tanishy7777, yes the PR is similar, I can take a look at it as well.
Blocking here, I just need to check the implementation IIRC there is a reason MSD algo itself is non-parallelisable, but may not apply if only the collection of particle positions is parallelised.
Like the tests were passing so I thought it has been parallized
hi @tanishy7777, sorry for long review -- many life things got in the way.
I think you're on a good path, I mentioned few minor things in the comments.
Main action items:
- move out hacky
@staticmethod def f(arrays): passout of the class- using datafiles in MDAnalysisTests (imported on top of
test_msd.py, check that parallelized run produces exactly the sameresultsas non-parallelized one (add code snippet to comments that anyone can run to check, and its results)- add this check as a test (if it's too slow, we can always mark it this way and not run by default)
Please ask if you have questions, and ping me here if I don't reply for more than 48 hours.
Hey, sorry for the late response. I had semester exams so I was quite busy the last 2 weeks. Will start working on this soon!
@marinegor
As suggested in https://github.com/MDAnalysis/mdanalysis/pull/4896#discussion_r1962561381
I have added the tests for checking the equivalence of the msd algorithm with both the serial and multiprocessing backends.
After running the tests, the results seem to be equal across both backends. Could you please review the implementation and confirm if this is correct?
Also, is this a good way of adding this test? Should I add it for all other methods that were parallelized?
I have added the following test for comparing the result.attributes
This test is passing.
Thank you!
@marinegor
As suggested in #4896 (comment) I have added the tests for checking the equivalence of the
msdalgorithm with both theserialandmultiprocessingbackends.After running the tests, the results seem to be equal across both backends. Could you please review the implementation and confirm if this is correct?
Also, is this a good way of adding this test? Should I add it for all other methods that were parallelized?
Thank you!
@marinegor just a gentle reminder
@marinegor tests pass, codecov is good, linter is happy – could you please have another look?
hey @tanishy7777 sorry for the long response, I had serious personal issues I had to attend to.
I'm planning to get back to reviewing PR -- could you please resolve the merge conflicts so I could do that?