libCEED icon indicating copy to clipboard operation
libCEED copied to clipboard

[WIP] OCCA Backend Update

Open kris-rowe opened this issue 3 years ago • 13 comments

When complete, this PR will update the OCCA backend to be compatible with OCCA v1.4 and add support for the OCCA OpenCL and SYCL backends.

This PR will close #816. This PR conflicts with #1007.

kris-rowe avatar Aug 19 '22 22:08 kris-rowe

Thanks @kris-rowe! Feel free to tag in myself or @jeremylt at any time.

jedbrown avatar Aug 19 '22 22:08 jedbrown

Great, I see this compiling. This issue is probably an easy fix, but I get this failure.

$ make test search=t001 BACKENDS=/cpu/self/occa
[...]
not ok 3 t001-ceed /cpu/self/occa stderr
# +/home/jed/src/libCEED/backends/occa/ceed-occa.cpp:336 in registerBackend():
# +---[ Error ]--------------------------------------------------------------------
# +File     : /home/jed/src/occa/src/types/json.cpp
# +Line     : 491
# +Function : operator[]
# +Message  : Path '' is not an object
# +Stack
# +4 build/t001-ceed(+0x11a6)
# +3 /usr/lib/libc.so.6(+0x232d0)
# +2 /usr/lib/libc.so.6(__libc_start_main+0x8a)
# +1 build/t001-ceed(+0x10a5)

Building with OPT='-g' gives this trace

#0  0x00007ffff776a4dc in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff771a998 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff770453d in abort () from /usr/lib/libc.so.6
#3  0x00007ffff7a32887 in CeedErrorAbort (ceed=0x5555558af080, filename=0x7ffff7bb36b8 "/home/jed/src/libCEED/backends/occa/ceed-occa.cpp", line_no=336, func=0x7ffff7bb37c9 "registerBackend", err_code=-2, format=0x5555558b4010 "\n---[ Error ]", '-' <repeats 68 times>, "\n    File     : /home/jed/src/occa/src/types/json.cpp\n    Line     : 491\n    Function : operator[]\n    Message  : Path "..., args=0x7fffffffd060) at /home/jed/src/libCEED/interface/ceed.c:1245
#4  0x00007ffff7a325c5 in CeedErrorImpl (ceed=0x5555558af080, filename=0x7ffff7bb36b8 "/home/jed/src/libCEED/backends/occa/ceed-occa.cpp", lineno=336, func=0x7ffff7bb37c9 "registerBackend", ecode=-2, format=0x5555558b4010 "\n---[ Error ]", '-' <repeats 68 times>, "\n    File     : /home/jed/src/occa/src/types/json.cpp\n    Line     : 491\n    Function : operator[]\n    Message  : Path "...) at /home/jed/src/libCEED/interface/ceed.c:1168
#5  0x00007ffff7aa3b42 in ceed::occa::registerBackend (resource=0x7fffffffde0b "/cpu/self/occa", ceed=0x5555558af080) at /home/jed/src/libCEED/backends/occa/ceed-occa.cpp:336
#6  0x00007ffff7a31e0a in CeedInit (resource=0x7fffffffde0b "/cpu/self/occa", ceed=0x7fffffffd7e0) at /home/jed/src/libCEED/interface/ceed.c:967
#7  0x00005555555551b8 in main (argc=2, argv=0x7fffffffd908) at /home/jed/src/libCEED/tests/t001-ceed.c:10

jedbrown avatar Aug 22 '22 21:08 jedbrown

I tracked down the configuration issue. Will see if the rest of the tests pass locally now.

kris-rowe avatar Aug 22 '22 22:08 kris-rowe

Our CI is back online so I think we'll be able to test this now with CUDA and/or ROCm. We have oneAPI compilers installed, though not with Intel hardware. What would you consider a good choice for testing?

jedbrown avatar Sep 06 '22 02:09 jedbrown

I can run this variant of the channel test (which includes output validation) with /cpu/self/occa and /gpu/cuda, but it fails with /gpu/cuda/occa.

$ build/fluids-navierstokes -options_file examples/fluids/channel.yaml -compare_final_state_atol 2e-11 -compare_final_state_filename examples/fluids/tests-output/fluids-navierstokes-channel.bin -dm_plex_box_faces 5,5,1 -ts_max_steps 5 -ceed /cpu/self/occa -dm_mat_preallocate_skip 0 -snes_fd_color

The trace is:

Thread 1 "fluids-navierst" received signal SIGSEGV, Segmentation fault.
0x00007fffbae86c51 in occa::modeMemory_t::addMemoryRef(occa::memory*) () from /home/jed/src/occa/lib/libocca.so
(gdb) bt
#0  0x00007fffbae86c51 in occa::modeMemory_t::addMemoryRef(occa::memory*) () from /home/jed/src/occa/lib/libocca.so
#1  0x00007ffff5aeed56 in ceed::occa::arrayToMemory<double> (array=array@entry=0x7ffe4be05400) at /home/jed/src/libCEED/backends/occa/ceed-occa-vector.hpp:27
#2  0x00007ffff5aecdef in ceed::occa::Vector::useArrayPointer (this=0x555559006800, mtype=<optimized out>, array=0x7ffe4be05400) at /home/jed/src/libCEED/backends/occa/ceed-occa-vector.cpp:246
#3  0x00007ffff5aed2f0 in ceed::occa::Vector::setArray (this=0x555559006800, mtype=mtype@entry=CEED_MEM_DEVICE, cmode=cmode@entry=CEED_USE_POINTER, array=array@entry=0x7ffe4be05400) at /home/jed/src/libCEED/backends/occa/ceed-occa-vector.cpp:166
#4  0x00007ffff5aed366 in ceed::occa::Vector::ceedSetArray (vec=<optimized out>, mtype=CEED_MEM_DEVICE, cmode=CEED_USE_POINTER, array=0x7ffe4be05400) at /home/jed/src/libCEED/backends/occa/ceed-occa-vector.cpp:450
#5  0x00007ffff5aac1fe in CeedVectorSetArray (vec=0x5555590f9920, mem_type=<optimized out>, copy_mode=copy_mode@entry=CEED_USE_POINTER, array=<optimized out>) at /home/jed/src/libCEED/interface/ceed-vector.c:274
#6  0x000055555557d9ea in ICs_FixMultiplicity (dm=0x555555d65480, ceed_data=0x555555d11720, user=<optimized out>, Q_loc=0x55555911d3b0, Q=0x555559118b30, time=<optimized out>, time@entry=0) at /home/jed/src/libCEED/examples/fluids/src/misc.c:38
#7  0x000055555555a4ef in main (argc=<optimized out>, argv=<optimized out>) at /home/jed/src/libCEED/examples/fluids/navierstokes.c:174

I'm just testing locally so far; does this work for you at JLSE?

jedbrown avatar Sep 07 '22 16:09 jedbrown

Our CI is back online so I think we'll be able to test this now with CUDA and/or ROCm. We have oneAPI compilers installed, though not with Intel hardware. What would you consider a good choice for testing?

The OCCA CUDA backend should be sufficient for testing. If you really wanted to test the SYCL backend specifically, you could build the public Intel LLVM compilers with the SYCL CUDA plugin enabled.

kris-rowe avatar Sep 08 '22 16:09 kris-rowe

Correct me if I'm wrong, but I understand from the comments and commits that the core tests, t1*-t5* and ex1, ex2 all pass with these changes.

In preparation for our upcoming release, I'd like to get this initial work merged. Specifically, I would like to merge the changes in Makefile, backends/occa/*, backends/ceed-backend-list.h and OMIT the changes in examples/*, tests/*, and include/ceed/jit-source/gallery/*

I think a couple of changes need to be added

  • [x] Update for changes in main (rebase or merge?)
  • [ ] Move files used in JiT to include/ceed/jit-source/occa
  • [x] Update tests/junit.py (replaces tests/tap.sh to only run tests for t1*-t3* on OCCA backends
  • [x] Update README.md with new OCCA min version and backends
  • [x] Update docs/sphinx/source/releasenotes.md

I can help with any/all of those changes

jeremylt avatar Sep 26 '22 15:09 jeremylt

Thanks for all the hard work here @kris-rowe!

PETSc released last week, so I'd like to tidy up the big open PRs and get a libCEED release soon. I don't want to step on any toes, but I have the time to run down those small tasks I listed above so we get this PR into the release if that's ok with you.

jeremylt avatar Oct 11 '22 15:10 jeremylt

PETSc released last week, so I'd like to tidy up the big open PRs and get a libCEED release soon. I don't want to step on any toes, but I have the time to run down those small tasks I listed above so we get this PR into the release if that's ok with you.

If you have time that would be a huge help.

kris-rowe avatar Oct 11 '22 15:10 kris-rowe

Ok, https://github.com/CEED/libCEED/pull/1072 should be ready to merge now

jeremylt avatar Oct 12 '22 16:10 jeremylt

Shall we rebase this after #1072? And is it possible to make block diagonal assembly fall back to CPU for now? This will allow us to run ceed-fluids on PVC with a narrowly localized performance distortion.

jedbrown avatar Oct 13 '22 20:10 jedbrown

Here's a suggested test.

diff --git i/examples/fluids/blasius.yaml w/examples/fluids/blasius.yaml
index cf3056b1e..fd7516ee6 100644
--- i/examples/fluids/blasius.yaml
+++ w/examples/fluids/blasius.yaml
@@ -4,7 +4,7 @@ implicit: true
 ts:
   adapt_type: 'none'
   type: 'beuler'
-  dt: 0.2e-5
+  dt: 0.1e-5
   max_time: 1.0e-3
 output_freq: 10
 
@@ -51,3 +51,15 @@ stg:
   use: false
   inflow_path: "./STGInflow_blasius.dat"
   mean_only: true
+
+pmat_pbdiagonal:
+ksp_type: bcgsl
+pc_type: vpbjacobi
+amat_type: shell
+
+# monitors
+ts_monitor:
+snes_monitor:
+ksp_converged_reason:
+
+ceed: /gpu/cuda

It should produce output like

 build/fluids-navierstokes  -options_file examples/fluids/blasius.yaml
[...]
0 TS dt 1e-06 time 0.
    0 SNES Function norm 6.801978174922e-04
    Linear solve converged due to CONVERGED_RTOL iterations 222
    1 SNES Function norm 1.515152456321e-07
    Linear solve converged due to CONVERGED_RTOL iterations 286
    2 SNES Function norm 4.628934945817e-11
1 TS dt 1e-06 time 1e-06
[...]

jedbrown avatar Dec 02 '22 16:12 jedbrown

For performance monitoring, you can add -log_view -log_view_gpu_time.

jedbrown avatar Dec 02 '22 16:12 jedbrown