mlpack icon indicating copy to clipboard operation
mlpack copied to clipboard

Optimization of DBSCAN with OpenMP

Open MarkFischinger opened this issue 1 year ago • 1 comments

I've been working on optimizing our DBSCAN implementation, and I'm excited to share the results with you! This pull request introduces several performance enhancements that should make the clustering algorithm much faster, especially for larger datasets.

Changes made

  1. Implemented parallel processing for centroids calculation
  2. Improved parallelization of cluster assignments
  3. Optimized batch clustering with better OpenMP usage
  4. Introduced atomic operations to reduce critical sections
  5. Enhanced memory locality for better cache utilization

I ran some benchmarks to compare the performance of the original implementation with the optimized version.

output

I've tested this on a variety of datasets, and it seems to be working well. However, I'd really appreciate if you could take a look and let me know if there's anything I've missed or if you have any suggestions for further improvements.

MarkFischinger avatar Jul 22 '24 21:07 MarkFischinger

@MarkFischinger could you resolve the conflict with the master branch ? thank you

shrit avatar Aug 26 '24 15:08 shrit

@rcurtin i implemented the suggestion in the search range and search range rules. Even with those changes, performance wise my benchmarks did not show a significant improvement that could not be explained by the standard deviation. The first algorithmic improvement was not aligned with the functionality of dbscan therefore i reverted it. Because of the nature of dbscan, finding a parallelization method or algorithmic improvement is not simple. I will come back to this as soon as i have a new idea.

MarkFischinger avatar Aug 31 '24 20:08 MarkFischinger

Yeah, the way to do it (I did it in 2015 but for some reason never opened a PR because it needed tuning that I never had the time for) is to use OpenMP tasks in the dual-tree traversers, and tune it so a new task is only spawned every handful of levels. That was the only way I saw speedup for dual-tree algorithms. The idea of parallel dual-tree traversers (or even single-tree traversers) is a good idea, but probably needs a more in-depth dive that maybe we could do some other time. Realistically, DBSCAN speedup would then come just from speeding up the range search itself as that (I think) is the majority of the runtime.

If you think that we won't be able to get noticeable speedup here in this PR, then we can go ahead and close it, that's fine with me :+1: (or did I misunderstand your message and you still think there is something we could merge for improvement? either way works for me)

rcurtin avatar Sep 01 '24 01:09 rcurtin