Cabana
Cabana copied to clipboard
WIP: Add heFFTe SYCL support
Run in the CI with CPU
@mkstoyanov I haven't looked into getting this working for a while - should we update the heFFTe version from 2.1 for oneapi support?
I see one old bug that came from an old SYCL update. I had to redefine some kernel templates ... update to 2.1, that problem should be fixed.
@streeve In #540 I have heffte 2.2.0 updates, if that helps.
Codecov Report
Merging #538 (298d9ec) into master (298d9ec) will not change coverage. The diff coverage is
n/a
.
:exclamation: Current head 298d9ec differs from pull request most recent head 69453a5. Consider uploading reports for the commit 69453a5 to get more accurate results
@@ Coverage Diff @@
## master #538 +/- ##
======================================
Coverage 94.9% 94.9%
======================================
Files 47 47
Lines 5734 5734
======================================
Hits 5444 5444
Misses 290 290
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Thanks @mkstoyanov updating to 2.2 worked. And thanks, @sfogerty, let's merge #540 before this one
@YuxingQiu how simple is it to reimplement the sparse map without recursion? If it's simple enough let's try to add in a separate PR. Otherwise we'll have to disable sparse with the SYCL backend for now
From the fedora:intel dpcpp
run:
error: SYCL kernel cannot call a recursive function
Oh, I did have some recursions when computing the bit used to represent a given integer number and some for Coord-ID conversion. The advantage of recursion is that it suits more cases and dimensional requirements, making the implementation more clean and elegant (since it's unified for any given case and dimension numbers). It's doable if we do want to change to a non-recursion version. The drawback could be that we may need separate implementations for different cases, like 2D and 3D. (well, now we don't support any other instances than 3D, but we may do later). I can do whichever you prefer.
@junghans any ideas here on the unrecognized argument oversubscribe
MPI errors?
@junghans any ideas here on the
unrecognized argument oversubscribe
MPI errors?
It looks like for some reason the mpi coming with the dpcpp doesn't support that argument...
@junghans I switched to only run dpcpp with one rank and now it can't even find the exe..
@junghans I switched to only run dpcpp with one rank and now it can't even find the exe..
Do we need the full path here: https://github.com/ECP-copa/Cabana/blob/290d1aafe65f5febcba245f62788aa8ef5dc2bbe/cmake/test_harness/test_harness.cmake#L94
${_target}
-> ${CMAKE_CURRENT_BINARY_DIR}/${_target}
@streeve I have been having a tough time with SYCL compiles for Cabana - I think the compiler install on our nodes doesn't include some needed tools. (Fails even when I include those tools in $PATH...)
Here I have pushed an untested fix. Basically, heFFTe recommends using a sycl::queue
with this backend. (see https://bitbucket.org/icl/heffte/src/master/examples/heffte_example_sycl.cpp)
@steverangel similar idea here, but different failures with the CPU SYCL build
@streeve Is it intentional that the CI is enabling both Heffte_ENABLE_ONEAPI=ON
and Heffte_ENABLE_MKL=ON
for the CI matrix value Heffte_MKL_ONEAPI
?
In any case, the CI has found an issue in our FFT defaults when both MKL and ONEAPI are enabled. I'm trying to thing how we would want to handle that - which backend we should set as default, or whether we can easily support compiling for both options at the same time.
Is it intentional that the CI is enabling both
Heffte_ENABLE_ONEAPI=ON
andHeffte_ENABLE_MKL=ON
for the CI matrix valueHeffte_MKL_ONEAPI
?
As I recall the intent was to enable both host and device FFTs if the right backends were enabled. I don't think we've ever tested that or have a clear case that needs it though. If you disable MKL and everything works okay, then that's okay with me. Either way we need to be clear in the documentation
Is it intentional that the CI is enabling both
Heffte_ENABLE_ONEAPI=ON
andHeffte_ENABLE_MKL=ON
for the CI matrix valueHeffte_MKL_ONEAPI
?As I recall the intent was to enable both host and device FFTs if the right backends were enabled. I don't think we've ever tested that or have a clear case that needs it though. If you disable MKL and everything works okay, then that's okay with me. Either way we need to be clear in the documentation
Quirks of OneMKL, since SYCL code is not exclusively GPU, you need both the CPU and GPU versions of the MKL to run. There are other dependencies and ultimately, you cannot use heffte OneAPI without also enabling CPU MKL. If you want to pick a default backend, you can do that using type-trait templates:
heffte::backend::is_enabled<backend-tag>::value
heffte::backend::uses_gpu<backend-tag>::value