libCEED
libCEED copied to clipboard
[WIP] OCCA Backend Update
When complete, this PR will update the OCCA backend to be compatible with OCCA v1.4 and add support for the OCCA OpenCL and SYCL backends.
This PR will close #816. This PR conflicts with #1007.
Thanks @kris-rowe! Feel free to tag in myself or @jeremylt at any time.
Great, I see this compiling. This issue is probably an easy fix, but I get this failure.
$ make test search=t001 BACKENDS=/cpu/self/occa
[...]
not ok 3 t001-ceed /cpu/self/occa stderr
# +/home/jed/src/libCEED/backends/occa/ceed-occa.cpp:336 in registerBackend():
# +---[ Error ]--------------------------------------------------------------------
# +File : /home/jed/src/occa/src/types/json.cpp
# +Line : 491
# +Function : operator[]
# +Message : Path '' is not an object
# +Stack
# +4 build/t001-ceed(+0x11a6)
# +3 /usr/lib/libc.so.6(+0x232d0)
# +2 /usr/lib/libc.so.6(__libc_start_main+0x8a)
# +1 build/t001-ceed(+0x10a5)
Building with OPT='-g' gives this trace
#0 0x00007ffff776a4dc in ?? () from /usr/lib/libc.so.6
#1 0x00007ffff771a998 in raise () from /usr/lib/libc.so.6
#2 0x00007ffff770453d in abort () from /usr/lib/libc.so.6
#3 0x00007ffff7a32887 in CeedErrorAbort (ceed=0x5555558af080, filename=0x7ffff7bb36b8 "/home/jed/src/libCEED/backends/occa/ceed-occa.cpp", line_no=336, func=0x7ffff7bb37c9 "registerBackend", err_code=-2, format=0x5555558b4010 "\n---[ Error ]", '-' <repeats 68 times>, "\n File : /home/jed/src/occa/src/types/json.cpp\n Line : 491\n Function : operator[]\n Message : Path "..., args=0x7fffffffd060) at /home/jed/src/libCEED/interface/ceed.c:1245
#4 0x00007ffff7a325c5 in CeedErrorImpl (ceed=0x5555558af080, filename=0x7ffff7bb36b8 "/home/jed/src/libCEED/backends/occa/ceed-occa.cpp", lineno=336, func=0x7ffff7bb37c9 "registerBackend", ecode=-2, format=0x5555558b4010 "\n---[ Error ]", '-' <repeats 68 times>, "\n File : /home/jed/src/occa/src/types/json.cpp\n Line : 491\n Function : operator[]\n Message : Path "...) at /home/jed/src/libCEED/interface/ceed.c:1168
#5 0x00007ffff7aa3b42 in ceed::occa::registerBackend (resource=0x7fffffffde0b "/cpu/self/occa", ceed=0x5555558af080) at /home/jed/src/libCEED/backends/occa/ceed-occa.cpp:336
#6 0x00007ffff7a31e0a in CeedInit (resource=0x7fffffffde0b "/cpu/self/occa", ceed=0x7fffffffd7e0) at /home/jed/src/libCEED/interface/ceed.c:967
#7 0x00005555555551b8 in main (argc=2, argv=0x7fffffffd908) at /home/jed/src/libCEED/tests/t001-ceed.c:10
I tracked down the configuration issue. Will see if the rest of the tests pass locally now.
Our CI is back online so I think we'll be able to test this now with CUDA and/or ROCm. We have oneAPI compilers installed, though not with Intel hardware. What would you consider a good choice for testing?
I can run this variant of the channel test (which includes output validation) with /cpu/self/occa and /gpu/cuda, but it fails with /gpu/cuda/occa.
$ build/fluids-navierstokes -options_file examples/fluids/channel.yaml -compare_final_state_atol 2e-11 -compare_final_state_filename examples/fluids/tests-output/fluids-navierstokes-channel.bin -dm_plex_box_faces 5,5,1 -ts_max_steps 5 -ceed /cpu/self/occa -dm_mat_preallocate_skip 0 -snes_fd_color
The trace is:
Thread 1 "fluids-navierst" received signal SIGSEGV, Segmentation fault.
0x00007fffbae86c51 in occa::modeMemory_t::addMemoryRef(occa::memory*) () from /home/jed/src/occa/lib/libocca.so
(gdb) bt
#0 0x00007fffbae86c51 in occa::modeMemory_t::addMemoryRef(occa::memory*) () from /home/jed/src/occa/lib/libocca.so
#1 0x00007ffff5aeed56 in ceed::occa::arrayToMemory<double> (array=array@entry=0x7ffe4be05400) at /home/jed/src/libCEED/backends/occa/ceed-occa-vector.hpp:27
#2 0x00007ffff5aecdef in ceed::occa::Vector::useArrayPointer (this=0x555559006800, mtype=<optimized out>, array=0x7ffe4be05400) at /home/jed/src/libCEED/backends/occa/ceed-occa-vector.cpp:246
#3 0x00007ffff5aed2f0 in ceed::occa::Vector::setArray (this=0x555559006800, mtype=mtype@entry=CEED_MEM_DEVICE, cmode=cmode@entry=CEED_USE_POINTER, array=array@entry=0x7ffe4be05400) at /home/jed/src/libCEED/backends/occa/ceed-occa-vector.cpp:166
#4 0x00007ffff5aed366 in ceed::occa::Vector::ceedSetArray (vec=<optimized out>, mtype=CEED_MEM_DEVICE, cmode=CEED_USE_POINTER, array=0x7ffe4be05400) at /home/jed/src/libCEED/backends/occa/ceed-occa-vector.cpp:450
#5 0x00007ffff5aac1fe in CeedVectorSetArray (vec=0x5555590f9920, mem_type=<optimized out>, copy_mode=copy_mode@entry=CEED_USE_POINTER, array=<optimized out>) at /home/jed/src/libCEED/interface/ceed-vector.c:274
#6 0x000055555557d9ea in ICs_FixMultiplicity (dm=0x555555d65480, ceed_data=0x555555d11720, user=<optimized out>, Q_loc=0x55555911d3b0, Q=0x555559118b30, time=<optimized out>, time@entry=0) at /home/jed/src/libCEED/examples/fluids/src/misc.c:38
#7 0x000055555555a4ef in main (argc=<optimized out>, argv=<optimized out>) at /home/jed/src/libCEED/examples/fluids/navierstokes.c:174
I'm just testing locally so far; does this work for you at JLSE?
Our CI is back online so I think we'll be able to test this now with CUDA and/or ROCm. We have oneAPI compilers installed, though not with Intel hardware. What would you consider a good choice for testing?
The OCCA CUDA backend should be sufficient for testing. If you really wanted to test the SYCL backend specifically, you could build the public Intel LLVM compilers with the SYCL CUDA plugin enabled.
Correct me if I'm wrong, but I understand from the comments and commits that the core tests, t1*-t5* and ex1, ex2 all pass with these changes.
In preparation for our upcoming release, I'd like to get this initial work merged. Specifically, I would like to merge the changes in Makefile, backends/occa/*, backends/ceed-backend-list.h and OMIT the changes in examples/*, tests/*, and include/ceed/jit-source/gallery/*
I think a couple of changes need to be added
- [x] Update for changes in main (rebase or merge?)
- [ ] Move files used in JiT to
include/ceed/jit-source/occa - [x] Update
tests/junit.py(replacestests/tap.shto only run tests for t1*-t3* on OCCA backends - [x] Update
README.mdwith new OCCA min version and backends - [x] Update
docs/sphinx/source/releasenotes.md
I can help with any/all of those changes
Thanks for all the hard work here @kris-rowe!
PETSc released last week, so I'd like to tidy up the big open PRs and get a libCEED release soon. I don't want to step on any toes, but I have the time to run down those small tasks I listed above so we get this PR into the release if that's ok with you.
PETSc released last week, so I'd like to tidy up the big open PRs and get a libCEED release soon. I don't want to step on any toes, but I have the time to run down those small tasks I listed above so we get this PR into the release if that's ok with you.
If you have time that would be a huge help.
Ok, https://github.com/CEED/libCEED/pull/1072 should be ready to merge now
Shall we rebase this after #1072? And is it possible to make block diagonal assembly fall back to CPU for now? This will allow us to run ceed-fluids on PVC with a narrowly localized performance distortion.
Here's a suggested test.
diff --git i/examples/fluids/blasius.yaml w/examples/fluids/blasius.yaml
index cf3056b1e..fd7516ee6 100644
--- i/examples/fluids/blasius.yaml
+++ w/examples/fluids/blasius.yaml
@@ -4,7 +4,7 @@ implicit: true
ts:
adapt_type: 'none'
type: 'beuler'
- dt: 0.2e-5
+ dt: 0.1e-5
max_time: 1.0e-3
output_freq: 10
@@ -51,3 +51,15 @@ stg:
use: false
inflow_path: "./STGInflow_blasius.dat"
mean_only: true
+
+pmat_pbdiagonal:
+ksp_type: bcgsl
+pc_type: vpbjacobi
+amat_type: shell
+
+# monitors
+ts_monitor:
+snes_monitor:
+ksp_converged_reason:
+
+ceed: /gpu/cuda
It should produce output like
build/fluids-navierstokes -options_file examples/fluids/blasius.yaml
[...]
0 TS dt 1e-06 time 0.
0 SNES Function norm 6.801978174922e-04
Linear solve converged due to CONVERGED_RTOL iterations 222
1 SNES Function norm 1.515152456321e-07
Linear solve converged due to CONVERGED_RTOL iterations 286
2 SNES Function norm 4.628934945817e-11
1 TS dt 1e-06 time 1e-06
[...]
For performance monitoring, you can add -log_view -log_view_gpu_time.