oneMKL icon indicating copy to clipboard operation
oneMKL copied to clipboard

[CUDA][HIP] Use device to get native context

Open hdelan opened this issue 1 year ago • 5 comments

Since https://github.com/oneapi-src/unified-runtime/pull/999 it is no longer valid to get the native context from the SYCL context on a multi GPU system. The get native func for contexts has been deprecated for this reason. See https://github.com/intel/llvm/pull/10975

Similar ticket: https://github.com/oneapi-src/oneDNN/pull/1765

hdelan avatar Dec 07 '23 15:12 hdelan

reading your changes, I have a question.

For example,

auto cudaDevice = sycl::get_nativesycl::backend::ext_oneapi_cuda(queue.get_device());

Is the type of cudaDevice "CUdevice" ?

jinz2014 avatar Dec 11 '23 15:12 jinz2014

Hi @jinz2014 yes you are correct!

hdelan avatar Dec 11 '23 15:12 hdelan

cufft_run.txt All the DFT changes look good to me and I've run the DFT tests successfully. I'd like to see test logs for the other backends before I approve.

FMarno avatar Dec 11 '23 16:12 FMarno

AMD tests for lapack and blas all passing: test_amd.txt

8 lapack nvidia test failing on GTX1050 but these tests are also failing on develop branch: test_cuda_lapack.txt

Nvidia blas tests passing test_cuda_blas.txt

hdelan avatar Dec 22 '23 17:12 hdelan

I see all the buffer tests failing for the rocblas backend with PI_ERROR_INVALID_OPERATION.

Logs: PR_425.txt

The failures are not because of the changes in this PR, but rather a recent change in the compiler. All these tests are expected to pass once https://github.com/oneapi-src/unified-runtime/pull/1226 and https://github.com/intel/llvm/pull/12297 are merged.

muhammad-tanvir-1211 avatar Jan 12 '24 11:01 muhammad-tanvir-1211

Some test results:

CUDA

gtx1050.txt Some failures due to precision also present on develop branch.

HIP

gfx90a_oneMKL_test.txt Test failures in HIP are also present on the develop branch: gfx90a_oneMKL_test_develop_branch.txt

I am not sure how to build/run the FFT tests. Are there some build/test instructions that I can follow?

hdelan avatar Mar 26 '24 14:03 hdelan

~~In terms of building with icpx 2024.0.0 for CUDA. I am getting a segfault at linking with develop branch.~~

Fixed. LD_LIBRARY_PATH problems -_-

I can successfully build this branch with icpx 2024.0.2 for CUDA

hdelan avatar Mar 26 '24 15:03 hdelan

Thanks a lot @hdelan ! The instructions are here but need to be improved.

The short answer is that you should just need to add -DENABLE_CUFFT_BACKEND=True -DENABLE_ROCFFT_BACKEND=True to also test the DFT domain with the native CUDA and HIP backends. If you are explicitly setting -DTARGET_DOMAINS in your CMake command you will also need to append dft to the list, otherwise it will be enabled by default. If you don't want to build and test the other domains again you can use -DTARGET_DOMAINS=dft.

Rbiessy avatar Mar 27 '24 08:03 Rbiessy

Thanks @Rbiessy !

Building rocFFT is broken for me but this PR does not touch that code. Building with cuFFT is OK. Here is updated tests for all oneMKL for CUDA including cuBLAS, cuFFT, cuRAND, cuSOLVER:

gtx1050.txt

hdelan avatar Mar 27 '24 10:03 hdelan

Thanks! LGTM

ericlars avatar Mar 28 '24 16:03 ericlars

Thanks for the review. Let me know @lhuot or @mmeterel if you need more time, otherwise I will go ahead and merge this on Monday.

Rbiessy avatar Mar 28 '24 16:03 Rbiessy