libCEED
libCEED copied to clipboard
Efficiently use in-order and out-of-order queues in the SYCL backends
-
Adds the flexibility to use in-order out-of-order queues depending upon the host application.
-
Default to in-order queues.
-
Only order queue as necessary for efficiency (if it is out of order).
-
Sets the same queue recursively through a hierarchy of Ceed objects (for the Ceed_parent and Ceed_delegate objects).
This all seems reasonable to me. I think all we need is the summary output of make prove-all on an env where these backends can run copied into this MR since we don't have automated CI for SYCL at this time.
In addition to the above suggested change, I also had to apply this patch to get it to compile on Aurora with 2023.12.05.001 compilers:
diff --git i/backends/sycl-ref/ceed-sycl-ref.hpp w/backends/sycl-ref/ceed-sycl-ref.hpp
index 56544c38..fc7bc775 100644
--- i/backends/sycl-ref/ceed-sycl-ref.hpp
+++ w/backends/sycl-ref/ceed-sycl-ref.hpp
@@ -94,7 +94,7 @@ typedef struct {
} CeedOperatorDiag_Sycl;
typedef struct {
- CeedInt num_elem, block_size_x, block_size_y, elem_per_block;
+ CeedInt num_elem, block_size_x, block_size_y, elems_per_block;
CeedInt num_e_mode_in, num_e_mode_out, num_qpts, num_nodes, block_size, num_comp; // Kernel parameters
bool fallback;
CeedScalar *d_B_in, *d_B_out;
I'm not sure why the above isn't showing up in the diff for the PR though, as the code on ALCF repo and the code here are definitely different.
Testing on Aurora, prove-all shows everything passing except:
petsc-bps ................................ Failed 1/2 subtests
petsc-dmswarm ............................ ok
ex1-volume ............................... ok
ex2-surface .............................. ok
petsc-multigrid .......................... Failed 1/2 subtests
petsc-bpsswarm ........................... ok
solids-elasticity ........................ ok
fluids-navierstokes ...................... Failed 15/22 subtests
Test Summary Report
-------------------
fluids-py-smartsim_regression_framework (Wstat: 256 Tests: 0 Failed: 0)
Non-zero exit status: 1
Parse errors: No plan found in TAP output
petsc-bps (Wstat: 0 Tests: 2 Failed: 1)
Failed test: 1
petsc-multigrid (Wstat: 0 Tests: 2 Failed: 1)
Failed test: 2
fluids-navierstokes (Wstat: 0 Tests: 22 Failed: 15)
Failed tests: 1, 3-4, 6-14, 17, 20-21
The PETSc failures are just due to the fact that I didn't have tetgen working, so it couldn't run the test with tetrahedral elements.
The fluids examples have some issue with segmentation faults, but I haven't had a chance to see where those faults are happening.
We should probably allow skipping tests that need tetgen in the test suite, like Ratel does
Segmentation faults occur in the SYCL Kokkos implementation and are present on main as well. I'll go ahead and say that the PR is good to go with the code changes mentioned above.
For reference, the backtrace of the seg-faults is:
#0 0x000015551339d872 in oneapi::mkl::sparse::gpu::doptimize_gemvUsm_impl_i4(sycl::_V1::queue&, oneapi::mkl::transpose, oneapi::mkl::sparse::matrix_handle*, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&) ()
from /soft/compilers/oneapi/2023.12.15.001/oneapi/mkl/2024.0/lib/libmkl_sycl_sparse.so.4
#1 0x0000155513ba7098 in oneapi::mkl::sparse::dispatch_optimize_gemv(sycl::_V1::queue&, oneapi::mkl::transpose, oneapi::mkl::sparse::matrix_handle*, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&) ()
from /soft/compilers/oneapi/2023.12.15.001/oneapi/mkl/2024.0/lib/libmkl_sycl_sparse.so.4
#2 0x000015551164561d in oneapi::mkl::sparse::optimize_gemv(sycl::_V1::queue&, oneapi::mkl::transpose, oneapi::mkl::sparse::matrix_handle*, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&) ()
from /soft/compilers/oneapi/2023.12.15.001/oneapi/mkl/2024.0/lib/libmkl_sycl_sparse.so.4
#3 0x0000155553595068 in KokkosSparse::Impl::SPMV<Kokkos::Experimental::SYCL, KokkosSparse::CrsMatrix<double const, int const, Kokkos::Device<Kokkos::Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpace>, Kokkos::MemoryTraits<1u
>, int const>, Kokkos::View<double const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpace>, Kokkos::MemoryTraits<3u> >, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::Device<Kok
kos::Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpace>, Kokkos::MemoryTraits<1u> >, true, false>::spmv(Kokkos::Experimental::SYCL const&, KokkosKernels::Experimental::Controls const&, char const*, double const&, KokkosSparse:
:CrsMatrix<double const, int const, Kokkos::Device<Kokkos::Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpace>, Kokkos::MemoryTraits<1u>, int const> const&, Kokkos::View<double const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos:
:Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpace>, Kokkos::MemoryTraits<3u> > const&, double const&, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Experimental::SYCL, Kokkos::Experimental::SYCLDeviceUSMSpa
ce>, Kokkos::MemoryTraits<1u> > const&) () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#4 0x0000155553590760 in void KokkosSparse::spmv<double, KokkosSparse::CrsMatrix<double, int, Kokkos::Experimental::SYCLDeviceUSMSpace, void, int>, Kokkos::View<double const*, Kokkos::Experimental::SYCLDeviceUSMSpace>, double, Kokkos::V
iew<double*, Kokkos::Experimental::SYCLDeviceUSMSpace> >(char const*, double const&, KokkosSparse::CrsMatrix<double, int, Kokkos::Experimental::SYCLDeviceUSMSpace, void, int> const&, Kokkos::View<double const*, Kokkos::Experimental::SYCL
DeviceUSMSpace> const&, double const&, Kokkos::View<double*, Kokkos::Experimental::SYCLDeviceUSMSpace> const&) () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#5 0x0000155553545232 in MatMult_SeqAIJKokkos(_p_Mat*, _p_Vec*, _p_Vec*) () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#6 0x00001555530f1359 in MatMult () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#7 0x0000155553af6997 in PCApplyBAorAB () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#8 0x0000155553a84075 in KSPSolve_GMRES () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#9 0x0000155553ae8df2 in KSPSolve_Private () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#10 0x0000155553ae8710 in KSPSolve () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#11 0x0000155553cd8467 in SNESSolve_NEWTONLS () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#12 0x0000155553cbd8c5 in SNESSolve () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#13 0x0000155553d70031 in TSStep_Theta () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#14 0x0000155553d4057d in TSStep () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#15 0x0000155553d41c53 in TSSolve () from /home/jrwrigh/software/petsc/arch_12-15_framework/lib/libpetsc.so.3.020
#16 0x0000000000463b32 in TSSolve_NS (dm=<optimized out>, user=0x63ae800, app_ctx=0x63b65e0, phys=<optimized out>, Q=Q@entry=0x7fffffff6318, f_time=f_time@entry=0x7fffffff6338, ts=0x7fffffff6330)
at /tmp/libCEED_main/examples/fluids/src/setupts.c:495
#17 0x000000000040d207 in main (argc=21, argv=0x7fffffff6468) at /tmp/libCEED_main/examples/fluids/navierstokes.c:248
Yeah, that segfault seems related to PETSc sparse Kokkos matrices and not libCEED. Seems like we're good to merge once the two fixes above are applied.
Note, this PR will need to be updated (rebase or similar) due to minor updates in main
This looks mostly ready, minus some minor changes. Anything I can do to help get this merged?
Anything I can do to help this PR move? I worry about it going stale
Is a rebase with main the only thing that needs to be done to get the PR merged?
A rebase and a clean test suite run is all that's needed
@jeremylt Looks like gitlab CI isn't running on this branch for some reason. Obviously the functional testing needs to be done on Sunspot, but am a bit confused why it's not running here. Maybe since the branch is from a fork?
Yup, CI on Noether won't run on PRs from forks
I suppose we'd like for the noether-sycl job to run. You can push the branch to this repo (I don't think it'll need to create a PR) or to the GitLab repo, where you should be able to see the pipeline. https://gitlab.com/libceed/libCEED/-/pipelines
In this case, since we need a manual full run of the test suite, I don't think we also need Noether to confirm that the SYCL backend builds with these changes.
Ready in my book