oneMKL icon indicating copy to clipboard operation
oneMKL copied to clipboard

# HIP backend support for MKL BLAS domain with DPC++/SYCL compiler (RFC)

Open TejaX-Alaghari opened this issue 3 years ago • 2 comments

Summary

This RFC is intended to extend the HIP backend of BLAS domain, which is currently supported for hipSYCL compiler to Intel’s DPC++/SYCL compiler.

  • This backend can be exposed to the users by “ENABLE_ROCM_BACKEND=True” after setting the CXX and C compiler to point to the experimental DPC++/SYCL compiler with support for HIP.
  • This implementation intends to stabilize the compatibility with experimental DPC++/SYCL compiler for HIP support on AMD devices.
  • Testing and Performance analysis are yet to be done.
  • Build process and compilation is successfully validated.

Problem statement

To add the HIP backend support for MKL BLAS domain on DPC++/SYCL compiler for HIP on AMD devices.

Details

Our implementation aims to achieve this goal by mapping the functions of oneMKL’s BLAS domain to ROCm’s rocBLAS library which coincides with the hipSYCL implementation as well. We verified the same by successfully validating our implementation with the sources from hipSYCL rocBLAS backend.

And hence the following minor changes were made to extend the support for this HIP backend to DPC++/SYCL compiler –

  • Since “interop_handler” is marked to be deprecated, we implemented the “RocblasScopedContextHandler” with “interop_handle” instead
  • Used “host_task” method of “sycl::handler” for launching the queue with “interop_handle”
  • Added “hip_runtime” header file explicitly in “rocblas_helper.hpp” for utilizing HIP data types in the backend interface.
  • Added “rocBLAS_scope_handle.hpp” and “rocBLAS_scope_handle.cpp” files to implement the class “RocblasScopedContextHandler” with DPC++/SYCL compiler instead of hipSYCL compiler.
  • To do that, we followed the same structure as the cuBLAS backend and added a placeholder for “pi_context” and “get_native*” methods which are yet to be supported by the DPC++ compiler for HIP on AMD devices.
  • We’re currently experimenting with a workaround for this issue by utilizing SYCL objects (handles, contexts, and memory) directly without relying on the “get_native*” methods. And will update the PR based on those results.

TejaX-Alaghari avatar Mar 09 '22 13:03 TejaX-Alaghari

@TejaX-Alaghari Thanks for the detailed explanations. We are also looking forward to add this capability to our project. Here are a few comments from my side:

  1. As you mentioned, DPC++ currently have experimental support for HIP on AMD devices see here . My understanding is, you have used this experimental implementation in your PR. Can you please verify?
  2. If yes, I would recommend to wait until this implementation matures before we add it to our project. Because, once it is enabled, any issue due to DPC++/HIP would break our CI.
  3. Meanwhile, I would also recommend to open a "placeholder" pull request to our repo, so that we can take a look at the implementation and results.

mmeterel avatar Mar 14 '22 18:03 mmeterel

@mmeterel, Thanks for your inputs.

This implementation indeed uses the experimental HIP backend support provided by the Intel SYCL compiler which is yet to support the same level of features as CUDA backend.

As per your suggestion, We'll proceed with creating a "placeholder" pull request.

TejaX-Alaghari avatar Mar 16 '22 22:03 TejaX-Alaghari

#189 is merged. Closing this RFC.

mmeterel avatar Sep 30 '22 23:09 mmeterel