Optimize RDF histogram with Cython+OpenMP
This PR adds an optimized histogram implementation using Cython and OpenMP that provides 10-18x speedup for RDF calculations with large datasets.
Context
This PR supersedes #5103, which used Numba. Following feedback from @orbeckst and @IAlibay that MDAnalysis traditionally uses Cython/C++ for acceleration, this implementation has been rewritten to use Cython with OpenMP, aligning with the project's existing infrastructure (similar to c_distances_openmp.pyx).
Implementation
The optimization strategies include:
- Cache-efficient memory access patterns with blocking
- Parallel execution using OpenMP with thread-local storage to avoid contention
- Reduced Python overhead through Cython compilation
- Automatic selection of serial vs parallel based on dataset size
Performance
With OMP_NUM_THREADS=4:
- 100k distances: 11.2x speedup
- 1M distances: 15.3x speedup
- 10M distances: 17.8x speedup
- Serial version: 5-7x consistent speedup
- 100% numerical accuracy (matches numpy.histogram within floating point precision)
Testing
- ✅ All 14 new histogram tests passing
- ✅ All 19 existing RDF tests passing
- ✅ Comprehensive correctness validation against numpy.histogram
- ✅ Serial/parallel consistency verified
- ✅ Tested with multiple data sizes, bin counts, and ranges
Technical Details
Files Modified:
-
package/MDAnalysis/lib/c_histogram.pyx- New Cython+OpenMP histogram implementation -
package/setup.py- Added histogram extension with OpenMP compilation flags -
package/MDAnalysis/analysis/rdf.py- Updated to use Cython histogram -
testsuite/MDAnalysisTests/lib/test_c_histogram.py- Comprehensive test suite -
package/CHANGELOG- Documented enhancement
Key Features:
- No external dependencies (Cython required only at build time)
- Follows MDAnalysis conventions (mirrors
c_distances_openmp.pyx) - Thread-safe parallel execution using chunking strategy
- Handles non-contiguous arrays automatically
Related to #3435
🤖 Generated with Claude Code (Sonnet 4.5), checked and approved by me.
PR Checklist
- [x] Issue raised/referenced?
- [x] Tests updated/added?
- [x] Documentation updated/added?
- [x]
package/CHANGELOGfile updated? - [x] Is your name in
package/AUTHORS? (Already added in previous PR) - [x] I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.
📚 Documentation preview 📚: https://mdanalysis--5128.org.readthedocs.build/en/5128/
@rhowardstone thank you for the cythonized PR (and your discussion post with background https://github.com/MDAnalysis/mdanalysis/discussions/5104 ). Sorry that you haven't heard from anyone in a while. We really appreciate your clear communication about using AI tools.
We currently don't have a clear policy in place how to handle primarily AI generated code. Until we have more clarity on how we as a project want to proceed, we are not going to merge this PR (so I'll leave a blocking review). I don't want to close the PR, though, and in fact I'd encourage you to resolve the merge conflicts so that the actual CI can run.
I also want to say that your PR is a good example for the potential and you're providing motivation to move forward with this challenging discussion.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 92.73%. Comparing base (5c7c480) to head (b4cd051).
Additional details and impacted files
@@ Coverage Diff @@
## develop #5128 +/- ##
===========================================
+ Coverage 92.68% 92.73% +0.04%
===========================================
Files 180 180
Lines 22452 22453 +1
Branches 3186 3186
===========================================
+ Hits 20809 20821 +12
- Misses 1169 1176 +7
+ Partials 474 456 -18
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Windows builds fail, see ege https://dev.azure.com/mdanalysis/mdanalysis/_build/results?buildId=8524&view=logs&j=bc985eae-65f0-56fb-9f4e-306b452dcfb3&t=99aa0f91-3087-50b5-0e7a-6e6967266174&l=2490
Okay! I think I got it? All tests now pass, let me know if there's anything else you need from me!