Cabana icon indicating copy to clipboard operation
Cabana copied to clipboard

WIP: Add heFFTe SYCL support

Open streeve opened this issue 2 years ago • 16 comments

Run in the CI with CPU

streeve avatar Jun 08 '22 20:06 streeve

@mkstoyanov I haven't looked into getting this working for a while - should we update the heFFTe version from 2.1 for oneapi support?

streeve avatar Jun 08 '22 21:06 streeve

I see one old bug that came from an old SYCL update. I had to redefine some kernel templates ... update to 2.1, that problem should be fixed.

mkstoyanov avatar Jun 08 '22 21:06 mkstoyanov

@streeve In #540 I have heffte 2.2.0 updates, if that helps.

sfogerty avatar Jun 09 '22 21:06 sfogerty

Codecov Report

Merging #538 (298d9ec) into master (298d9ec) will not change coverage. The diff coverage is n/a.

:exclamation: Current head 298d9ec differs from pull request most recent head 69453a5. Consider uploading reports for the commit 69453a5 to get more accurate results

@@          Coverage Diff           @@
##           master    #538   +/-   ##
======================================
  Coverage    94.9%   94.9%           
======================================
  Files          47      47           
  Lines        5734    5734           
======================================
  Hits         5444    5444           
  Misses        290     290           

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Jun 09 '22 21:06 codecov[bot]

Thanks @mkstoyanov updating to 2.2 worked. And thanks, @sfogerty, let's merge #540 before this one

streeve avatar Jun 13 '22 14:06 streeve

@YuxingQiu how simple is it to reimplement the sparse map without recursion? If it's simple enough let's try to add in a separate PR. Otherwise we'll have to disable sparse with the SYCL backend for now

From the fedora:intel dpcpp run: error: SYCL kernel cannot call a recursive function

streeve avatar Jun 13 '22 14:06 streeve

Oh, I did have some recursions when computing the bit used to represent a given integer number and some for Coord-ID conversion. The advantage of recursion is that it suits more cases and dimensional requirements, making the implementation more clean and elegant (since it's unified for any given case and dimension numbers). It's doable if we do want to change to a non-recursion version. The drawback could be that we may need separate implementations for different cases, like 2D and 3D. (well, now we don't support any other instances than 3D, but we may do later). I can do whichever you prefer.

YuxingQiu avatar Jun 14 '22 19:06 YuxingQiu

@junghans any ideas here on the unrecognized argument oversubscribe MPI errors?

streeve avatar Jun 17 '22 15:06 streeve

@junghans any ideas here on the unrecognized argument oversubscribe MPI errors?

It looks like for some reason the mpi coming with the dpcpp doesn't support that argument...

junghans avatar Jun 17 '22 15:06 junghans

@junghans I switched to only run dpcpp with one rank and now it can't even find the exe..

streeve avatar Jun 20 '22 19:06 streeve

@junghans I switched to only run dpcpp with one rank and now it can't even find the exe..

Do we need the full path here: https://github.com/ECP-copa/Cabana/blob/290d1aafe65f5febcba245f62788aa8ef5dc2bbe/cmake/test_harness/test_harness.cmake#L94

${_target} -> ${CMAKE_CURRENT_BINARY_DIR}/${_target}

junghans avatar Jun 21 '22 15:06 junghans

@streeve I have been having a tough time with SYCL compiles for Cabana - I think the compiler install on our nodes doesn't include some needed tools. (Fails even when I include those tools in $PATH...)

Here I have pushed an untested fix. Basically, heFFTe recommends using a sycl::queue with this backend. (see https://bitbucket.org/icl/heffte/src/master/examples/heffte_example_sycl.cpp)

sfogerty avatar Jun 30 '22 18:06 sfogerty

@steverangel similar idea here, but different failures with the CPU SYCL build

streeve avatar Aug 04 '22 20:08 streeve

@streeve Is it intentional that the CI is enabling both Heffte_ENABLE_ONEAPI=ON and Heffte_ENABLE_MKL=ON for the CI matrix value Heffte_MKL_ONEAPI?

In any case, the CI has found an issue in our FFT defaults when both MKL and ONEAPI are enabled. I'm trying to thing how we would want to handle that - which backend we should set as default, or whether we can easily support compiling for both options at the same time.

sfogerty avatar Sep 02 '22 19:09 sfogerty

Is it intentional that the CI is enabling both Heffte_ENABLE_ONEAPI=ON and Heffte_ENABLE_MKL=ON for the CI matrix value Heffte_MKL_ONEAPI?

As I recall the intent was to enable both host and device FFTs if the right backends were enabled. I don't think we've ever tested that or have a clear case that needs it though. If you disable MKL and everything works okay, then that's okay with me. Either way we need to be clear in the documentation

streeve avatar Sep 02 '22 19:09 streeve

Is it intentional that the CI is enabling both Heffte_ENABLE_ONEAPI=ON and Heffte_ENABLE_MKL=ON for the CI matrix value Heffte_MKL_ONEAPI?

As I recall the intent was to enable both host and device FFTs if the right backends were enabled. I don't think we've ever tested that or have a clear case that needs it though. If you disable MKL and everything works okay, then that's okay with me. Either way we need to be clear in the documentation

Quirks of OneMKL, since SYCL code is not exclusively GPU, you need both the CPU and GPU versions of the MKL to run. There are other dependencies and ultimately, you cannot use heffte OneAPI without also enabling CPU MKL. If you want to pick a default backend, you can do that using type-trait templates:

heffte::backend::is_enabled<backend-tag>::value
heffte::backend::uses_gpu<backend-tag>::value

mkstoyanov avatar Sep 05 '22 07:09 mkstoyanov