ginkgo Simplify GMRES kernels

This PR separates the step1 and initialize2 kernels into individual reductions (norm and dot) and axpy/scale operations, which allows us to use the simple kernel setup for all of GMRES as well. This will also simplify the addition of CGS-Arnoldi to plain GMRES (and distributed GMRES later on)

The branch is based on simple_kernel_reduction, which is why the changes are a bit obscured. But the base isn't strictly necessary, so I could remove it.

TODO:

[ ] Add reference kernel tests
[x] Fix DPC++
[x] Fix complex GMRES (since we don't have real * complex scal operation yet - #864 )
[ ] Test more edge cases (stopping, finalized, large discrepancy between convergence speeds... across restarts)
[ ] ~CB-GMRES (this one will need simple kernel reductions though!)~

Aug 18 '21 15:08 upsj

rebase!

Sep 13 '22 11:09 thoasm

Error: The following files need to be formatted:

common/unified/solver/gmres_kernels.cpp
hip/base/kernel_launch.hip.hpp
reference/solver/gmres_kernels.cpp
test/solver/gmres_kernels.cpp

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

Sep 18 '22 20:09 ginkgo-bot

I am not 100% happy with the way I deal with the complex next_krylov norm computation (currently, I need additional storage for that). I am looking into alternative ways, but the rest of the PR should be in a good state.

Sep 18 '22 20:09 thoasm

format-rebase!

Sep 18 '22 20:09 thoasm

Error: Rebase failed, see the related Action for details

Sep 18 '22 20:09 ginkgo-bot

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 0 Removed, 0 Changed (16 filtered out), 6 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

Sep 18 '22 20:09 ginkgo-bot

format-rebase!

Oct 13 '22 10:10 thoasm

Formatting rebase introduced changes, see Artifacts here to review them

Oct 13 '22 10:10 ginkgo-bot

@pratikvn I'm not so sure if the distributed case will be significantly different. I think you can just duplicate the hessenberg matrix on each process and then just do the hessenberg solve, etc. locally. The issue before was that the scalar products were done within the kernels, so we had no chance to use the distributed scalar products.

Oct 21 '22 14:10 MarcelKoch

I finally ran the benchmark and compared the GPU performance between the current develop and gmres_simplify: I tested 8 matrices, and all of them show a speedup of approx. 1.1 or more (meaning the change in this PR actually speeds up GMRES on an A100). This implementation was never slower than what is currently in develop.

Oct 26 '22 16:10 thoasm

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
65 Code Smells

66.1% Coverage
8.9% Duplication

Nov 01 '22 10:11 sonarqubecloud[bot]

@yhmtsai I ran the same benchmark with 4 RHS:

Matrix name	develop iters	develop time [s]	gmres_simplify iters	gmres_simplify time [s]
G3_circuit	704	9.09	704	8.10
t2em	219	1.55	219	1.49
circuit5M_dc	42	0.65	42	0.55
audikw_1	17169	196.05	12747	140.18
Bump_2911	40862	1205.90	40368	1058.94
ecology1	1219	9.81	1219	9.28
ss	1089	16.14	1089	14.50
mc2depi	3345	13.10	3294	13.78

Only mc2depi seems to be slower, the rest is faster with this PR.

Nov 02 '22 16:11 thoasm

Nice results, thanks Thomas.

Nov 02 '22 17:11 tcojean

format-rebase!

Nov 03 '22 13:11 thoasm

Formatting rebase introduced changes, see Artifacts here to review them

Nov 03 '22 13:11 ginkgo-bot

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 0 Removed, 0 Changed (16 filtered out), 6 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

Nov 03 '22 13:11 ginkgo-bot

Error: PR already merged!

Nov 03 '22 20:11 ginkgo-bot

ginkgo ginkgo copied to clipboard

Simplify GMRES kernels

ginkgo
ginkgo copied to clipboard