libCEED icon indicating copy to clipboard operation
libCEED copied to clipboard

Efficiently use in-order and out-of-order queues in the SYCL backends

Open uumesh opened this issue 1 year ago • 8 comments

  • Adds the flexibility to use in-order out-of-order queues depending upon the host application.

  • Default to in-order queues.

  • Only order queue as necessary for efficiency (if it is out of order).

  • Sets the same queue recursively through a hierarchy of Ceed objects (for the Ceed_parent and Ceed_delegate objects).

uumesh avatar Jan 30 '24 21:01 uumesh

This all seems reasonable to me. I think all we need is the summary output of make prove-all on an env where these backends can run copied into this MR since we don't have automated CI for SYCL at this time.

jeremylt avatar Feb 07 '24 18:02 jeremylt

In addition to the above suggested change, I also had to apply this patch to get it to compile on Aurora with 2023.12.05.001 compilers:

diff --git i/backends/sycl-ref/ceed-sycl-ref.hpp w/backends/sycl-ref/ceed-sycl-ref.hpp
index 56544c38..fc7bc775 100644
--- i/backends/sycl-ref/ceed-sycl-ref.hpp
+++ w/backends/sycl-ref/ceed-sycl-ref.hpp
@@ -94,7 +94,7 @@ typedef struct {
 } CeedOperatorDiag_Sycl;

 typedef struct {
-  CeedInt     num_elem, block_size_x, block_size_y, elem_per_block;
+  CeedInt     num_elem, block_size_x, block_size_y, elems_per_block;
   CeedInt     num_e_mode_in, num_e_mode_out, num_qpts, num_nodes, block_size, num_comp;  // Kernel parameters
   bool        fallback;
   CeedScalar *d_B_in, *d_B_out;

I'm not sure why the above isn't showing up in the diff for the PR though, as the code on ALCF repo and the code here are definitely different.

jrwrigh avatar Feb 08 '24 20:02 jrwrigh

Testing on Aurora, prove-all shows everything passing except:

petsc-bps ................................ Failed 1/2 subtests
petsc-dmswarm ............................ ok
ex1-volume ............................... ok
ex2-surface .............................. ok
petsc-multigrid .......................... Failed 1/2 subtests
petsc-bpsswarm ........................... ok
solids-elasticity ........................ ok
fluids-navierstokes ...................... Failed 15/22 subtests

Test Summary Report
-------------------
fluids-py-smartsim_regression_framework (Wstat: 256 Tests: 0 Failed: 0)
  Non-zero exit status: 1
  Parse errors: No plan found in TAP output
petsc-bps                              (Wstat: 0 Tests: 2 Failed: 1)
  Failed test:  1
petsc-multigrid                        (Wstat: 0 Tests: 2 Failed: 1)
  Failed test:  2
fluids-navierstokes                    (Wstat: 0 Tests: 22 Failed: 15)
  Failed tests:  1, 3-4, 6-14, 17, 20-21

The PETSc failures are just due to the fact that I didn't have tetgen working, so it couldn't run the test with tetrahedral elements.

The fluids examples have some issue with segmentation faults, but I haven't had a chance to see where those faults are happening.

jrwrigh avatar Feb 08 '24 20:02 jrwrigh

We should probably allow skipping tests that need tetgen in the test suite, like Ratel does

jeremylt avatar Feb 08 '24 20:02 jeremylt

Segmentation faults occur in the SYCL Kokkos implementation and are present on main as well. I'll go ahead and say that the PR is good to go with the code changes mentioned above.

For reference, the backtrace of the seg-faults is:

#0  0x000015551339d872 in oneapi::mkl::sparse::gpu::doptimize_gemvUsm_impl_i4(sycl::_V1::queue&, oneapi::mkl::transpose, oneapi::mkl::sparse::matrix_handle*, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&) ()
   from /soft/compilers/oneapi/2023.12.15.001/oneapi/mkl/2024.0/lib/libmkl_sycl_sparse.so.4
#1  0x0000155513ba7098 in oneapi::mkl::sparse::dispatch_optimize_gemv(sycl::_V1::queue&, oneapi::mkl::transpose, oneapi::mkl::sparse::matrix_handle*, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&) ()
   from /soft/compilers/oneapi/2023.12.15.001/oneapi/mkl/2024.0/lib/libmkl_sycl_sparse.so.4
#2  0x000015551164561d in oneapi::mkl::sparse::optimize_gemv(sycl::_V1::queue&, oneapi::mkl::transpose, oneapi::mkl::sparse::matrix_handle*, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&) ()
   from /soft/compilers/oneapi/2023.12.15.001/oneapi/mkl/2024.0/lib/libmkl_sycl_sparse.so.4
#3  0x0000155553595068 in KokkosSparse::Impl::SPMV<Kokkos::Experimental::SYCL, KokkosSparse::CrsMatrix<double const, int const, Kokkos::Device<Kokkos::Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpace>, Kokkos::MemoryTraits<1u
>, int const>, Kokkos::View<double const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpace>, Kokkos::MemoryTraits<3u> >, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::Device<Kok
kos::Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpace>, Kokkos::MemoryTraits<1u> >, true, false>::spmv(Kokkos::Experimental::SYCL const&, KokkosKernels::Experimental::Controls const&, char const*, double const&, KokkosSparse:
:CrsMatrix<double const, int const, Kokkos::Device<Kokkos::Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpace>, Kokkos::MemoryTraits<1u>, int const> const&, Kokkos::View<double const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos:
:Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpace>, Kokkos::MemoryTraits<3u> > const&, double const&, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpa
ce>, Kokkos::MemoryTraits<1u> > const&) () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#4  0x0000155553590760 in void KokkosSparse::spmv<double, KokkosSparse::CrsMatrix<double, int, Kokkos::Experimental::SYCLDeviceUSMSpace, void, int>, Kokkos::View<double const*, Kokkos::Experimental::SYCLDeviceUSMSpace>, double, Kokkos::V
iew<double*, Kokkos::Experimental::SYCLDeviceUSMSpace> >(char const*, double const&, KokkosSparse::CrsMatrix<double, int, Kokkos::Experimental::SYCLDeviceUSMSpace, void, int> const&, Kokkos::View<double const*, Kokkos::Experimental::SYCL
DeviceUSMSpace> const&, double const&, Kokkos::View<double*, Kokkos::Experimental::SYCLDeviceUSMSpace> const&) () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#5  0x0000155553545232 in MatMult_SeqAIJKokkos(_p_Mat*, _p_Vec*, _p_Vec*) () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#6  0x00001555530f1359 in MatMult () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#7  0x0000155553af6997 in PCApplyBAorAB () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#8  0x0000155553a84075 in KSPSolve_GMRES () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#9  0x0000155553ae8df2 in KSPSolve_Private () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#10 0x0000155553ae8710 in KSPSolve () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#11 0x0000155553cd8467 in SNESSolve_NEWTONLS () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#12 0x0000155553cbd8c5 in SNESSolve () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#13 0x0000155553d70031 in TSStep_Theta () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#14 0x0000155553d4057d in TSStep () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#15 0x0000155553d41c53 in TSSolve () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#16 0x0000000000463b32 in TSSolve_NS (dm=<optimized out>, user=0x63ae800, app_ctx=0x63b65e0, phys=<optimized out>, Q=Q@entry=0x7fffffff6318, f_time=f_time@entry=0x7fffffff6338, ts=0x7fffffff6330)
    at /tmp/libCEED_main/examples/fluids/src/setupts.c:495
#17 0x000000000040d207 in main (argc=21, argv=0x7fffffff6468) at /tmp/libCEED_main/examples/fluids/navierstokes.c:248

jrwrigh avatar Feb 08 '24 22:02 jrwrigh

Yeah, that segfault seems related to PETSc sparse Kokkos matrices and not libCEED. Seems like we're good to merge once the two fixes above are applied.

jeremylt avatar Feb 13 '24 17:02 jeremylt

Note, this PR will need to be updated (rebase or similar) due to minor updates in main

jeremylt avatar Feb 29 '24 18:02 jeremylt

This looks mostly ready, minus some minor changes. Anything I can do to help get this merged?

jeremylt avatar Mar 05 '24 22:03 jeremylt

Anything I can do to help this PR move? I worry about it going stale

jeremylt avatar Apr 02 '24 20:04 jeremylt

Is a rebase with main the only thing that needs to be done to get the PR merged?

uumesh avatar Apr 02 '24 20:04 uumesh

A rebase and a clean test suite run is all that's needed

jeremylt avatar Apr 02 '24 20:04 jeremylt

@jeremylt Looks like gitlab CI isn't running on this branch for some reason. Obviously the functional testing needs to be done on Sunspot, but am a bit confused why it's not running here. Maybe since the branch is from a fork?

jrwrigh avatar Apr 03 '24 18:04 jrwrigh

Yup, CI on Noether won't run on PRs from forks

jeremylt avatar Apr 03 '24 18:04 jeremylt

I suppose we'd like for the noether-sycl job to run. You can push the branch to this repo (I don't think it'll need to create a PR) or to the GitLab repo, where you should be able to see the pipeline. https://gitlab.com/libceed/libCEED/-/pipelines

jedbrown avatar Apr 03 '24 18:04 jedbrown

In this case, since we need a manual full run of the test suite, I don't think we also need Noether to confirm that the SYCL backend builds with these changes.

jeremylt avatar Apr 03 '24 23:04 jeremylt

Ready in my book

jeremylt avatar Apr 04 '24 16:04 jeremylt