Offload matrix based interpolation
I've opened this as a draft PR to get early feedback and to recognise that we might want to iterate on the design.
This PR:
- adds a SparseMatrix class into Atlas that wraps eckit's SparseMatrix class to combine it with GPU memory management
- adds a hicsparse backend to Atlas's sparse linear algebra interface
- adds "multiply-add" support into Atlas's sparse linear algebra interface
- updates interpolation to support use of the hicparse backend of Atlas's sparse linear algebra interface
Thanks @l90lpa, so far I had a first look at the atlas::linalg::SparseMatrix class only.
I am making a note that in principle it could also just inherit from eckit::linalg::SparseMatrix, and add the device capability on top. On the other hand this way it's possible to implement it differently, and construct an eckit::linalg::SparseMatrix only when needed.
It should be mentioned explicitly that this PR depends on first merging #237 and should then be rebased on develop.
Perhaps this PR is adding too many ingredients at once. Maybe we could just focus on the gpu-offloading first and add the 'multiply-add' later?
In that respect of multiply-add (for a new PR), should matrix_multiply_add not immediately also implement the following formula including $$\alpha$$ and $$\beta$$?
y = \alpha A x + \beta y
I guess here you have $$\alpha=1$$ and $$\beta=1$$, so that:
y = A x + y
That could still be a simpler overloaded API for sure.
Perhaps this PR is adding too many ingredients at once. Maybe we could just focus on the gpu-offloading first and add the 'multiply-add' later?
In that respect of multiply-add (for a new PR), should matrix_multiply_add not immediately also implement the following formula including α and β ? y = α A x + β y
I guess here you have α = 1 and β = 1 , so that: y = A x + y
That could still be a simpler overloaded API for sure.
Regarding PR content: I'm happy to break-up the PR however you would like. I only included the complete work so that you would be able to see an overview in advance. That said, I just want to mention that the reason multiply-add is part of this work is to facilitate the GPU offload of interpolation's execute_adjoint.
Regarding multiply-add: whether matrix_multiply_add should immediately implement the more broad interface is up to you (I'm happy either way). I chose not to, because I wasn't aware of a need for the additional behaviour, and so I didn't want to introduce an interface that wouldn't get used.
I appreciate very much this "draft" PR to show the complete work! Thanks! 🙏 I had a deeper look now and I definitely like your approach taken so far.
Ideally with the overall design in mind now this can be split several different PRs in this order:
- Implementing multiply-add, possibly with the extended capability using $$\alpha$$ and $$\beta$$...
...and update the adjoint interpolation methods to use this. Because this routine did not exist before, @MarekWlasak implemented the adjoint as first a matrix-multiply followed by a
+=. Perhaps this approach should still be used for the eckit-backend codepaths, and could possibly live inside thematrix_multiply_addimplementation for the eckit-backend (within atlas) itself. - Implement GPU-offloading capability for the new sparse matrix
- Create hicsparse backend
- Adapt interpolation methods to enable the GPU offloading.
That sounds like a great plan, and thanks for taking a look so far! I'll work on submitting those PR's.
I've submitted PR's for the first 2 items (https://github.com/ecmwf/atlas/pull/240 and https://github.com/ecmwf/atlas/pull/241). And, I'll submitted the remaining 2 PRs as their dependencies pass review and get merged in.
Dear @l90lpa , steps 1, 2, 3 above have been incorporated now:
- 1: #240
- 2: #247
- 3: #246
so that all that needs to be done now is step 4
- 4: Adapt interpolation methods to enable the GPU offloading.
Hi @wdeconinck, yes that's right. I have a branch (on a fork) that does that contains an implementation of that adaptation. I just haven't opened a PR because priorities changed but I could revise it and open the PR next week if that works for you?