ginkgo
ginkgo copied to clipboard
OpenMP Merge-SpMV algorithm from Merril & Garland for CSR SpMV
This PR brings Merge-SpMV algorithm from Merril & Garland for CSR SpMV. The implementation is based on the original article https://github.com/dumerrill/merge-spmv/raw/master/merge-based-spmv-sc16-preprint.pdf and source code https://github.com/dumerrill/merge-spmv/blob/master/cpu_spmv.cpp from Merril & Garland work back in 2016. It is using a "2D" approach to do a better load-balancing between the threads. In my extensive investigation on 6 Arm and x86 machines with 8 SuiteSparse matrices using Ginkgo 1.6, this algorithm was performing between -4% to 80% better than the current CSR SpMV implementation.
Implementation in this PR was tested and optimized only for RHS=1, but in principle it should work for RHS>1 as well. Note that I have also tested the second variant of Merge-SpMV algorithm (Algorithm 4 in https://github.com/dumerrill/merge-spmv/raw/master/merge-based-spmv-sc16-preprint.pdf), but it was always performing worse than the original Merge-SpMV algorithm (except for a few matrices on A64FX machine, but I believe this can be ignored).
At the moment the code in this PR still lacks 3 things that I leave to the experts to propose what is the best way to do it:
- Integration with the rest of Ginkgo.
- If necessary, addition of any tests or comparisons with the default CSR SpMV.
- I guess my name and affiliation ("Luka Stanisic [email protected], Huawei Technologies Duesseldorf GmbH") as well as Merge-SpMV project (3-Clause BSD license https://github.com/dumerrill/merge-spmv/blob/master/LICENSE.TXT ) should be mentioned somewhere.
Thank you for the review @yhmtsai, I agree with every single comment that you have made. Unfortunately, I am losing the access to the machine from which I can make public contributions. Therefore, I was hoping that Ginkgo community could take over with merging this PR.