ginkgo icon indicating copy to clipboard operation
ginkgo copied to clipboard

Simplify GMRES kernels

Open upsj opened this issue 3 years ago • 6 comments

This PR separates the step1 and initialize2 kernels into individual reductions (norm and dot) and axpy/scale operations, which allows us to use the simple kernel setup for all of GMRES as well. This will also simplify the addition of CGS-Arnoldi to plain GMRES (and distributed GMRES later on)

The branch is based on simple_kernel_reduction, which is why the changes are a bit obscured. But the base isn't strictly necessary, so I could remove it.

TODO:

  • [ ] Add reference kernel tests
  • [x] Fix DPC++
  • [x] Fix complex GMRES (since we don't have real * complex scal operation yet - #864 )
  • [ ] Test more edge cases (stopping, finalized, large discrepancy between convergence speeds... across restarts)
  • [ ] ~CB-GMRES (this one will need simple kernel reductions though!)~

upsj avatar Aug 18 '21 15:08 upsj

rebase!

thoasm avatar Sep 13 '22 11:09 thoasm

Error: The following files need to be formatted:

common/unified/solver/gmres_kernels.cpp
hip/base/kernel_launch.hip.hpp
reference/solver/gmres_kernels.cpp
test/solver/gmres_kernels.cpp

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

ginkgo-bot avatar Sep 18 '22 20:09 ginkgo-bot

I am not 100% happy with the way I deal with the complex next_krylov norm computation (currently, I need additional storage for that). I am looking into alternative ways, but the rest of the PR should be in a good state.

thoasm avatar Sep 18 '22 20:09 thoasm

format-rebase!

thoasm avatar Sep 18 '22 20:09 thoasm

Error: Rebase failed, see the related Action for details

ginkgo-bot avatar Sep 18 '22 20:09 ginkgo-bot

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 0 Removed, 0 Changed (16 filtered out), 6 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

ginkgo-bot avatar Sep 18 '22 20:09 ginkgo-bot

format-rebase!

thoasm avatar Oct 13 '22 10:10 thoasm

Formatting rebase introduced changes, see Artifacts here to review them

ginkgo-bot avatar Oct 13 '22 10:10 ginkgo-bot

@pratikvn I'm not so sure if the distributed case will be significantly different. I think you can just duplicate the hessenberg matrix on each process and then just do the hessenberg solve, etc. locally. The issue before was that the scalar products were done within the kernels, so we had no chance to use the distributed scalar products.

MarcelKoch avatar Oct 21 '22 14:10 MarcelKoch

I finally ran the benchmark and compared the GPU performance between the current develop and gmres_simplify: I tested 8 matrices, and all of them show a speedup of approx. 1.1 or more (meaning the change in this PR actually speeds up GMRES on an A100). This implementation was never slower than what is currently in develop.

thoasm avatar Oct 26 '22 16:10 thoasm

@yhmtsai I ran the same benchmark with 4 RHS:

Matrix name develop iters develop time [s] gmres_simplify iters gmres_simplify time [s]
G3_circuit 704 9.09 704 8.10
t2em 219 1.55 219 1.49
circuit5M_dc 42 0.65 42 0.55
audikw_1 17169 196.05 12747 140.18
Bump_2911 40862 1205.90 40368 1058.94
ecology1 1219 9.81 1219 9.28
ss 1089 16.14 1089 14.50
mc2depi 3345 13.10 3294 13.78

Only mc2depi seems to be slower, the rest is faster with this PR.

thoasm avatar Nov 02 '22 16:11 thoasm

Nice results, thanks Thomas.

tcojean avatar Nov 02 '22 17:11 tcojean

format-rebase!

thoasm avatar Nov 03 '22 13:11 thoasm

Formatting rebase introduced changes, see Artifacts here to review them

ginkgo-bot avatar Nov 03 '22 13:11 ginkgo-bot

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 0 Removed, 0 Changed (16 filtered out), 6 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

ginkgo-bot avatar Nov 03 '22 13:11 ginkgo-bot

Error: PR already merged!

ginkgo-bot avatar Nov 03 '22 20:11 ginkgo-bot