oneMKL
oneMKL copied to clipboard
MKL FP16 GEMM crash on MTL iGPU
Summary
I found on MTL iGPU, if I call FP16 gemm of onemkl (no matter using OneAPI 2024.0 or 2024.2), the program will crash, and if I call it many times, it will cause my machine to freeze directly. However, on ARC, everything is fine.
Version
oneAPI 2024.0 or oneAPI 2024.2 .
Environment
- minimal c++ program
- Windows 11
- icx 2024.0.2 or 2024.2
- Hardware: Intel Core Ultra iGPU
Steps to reproduce
#include <sycl/sycl.hpp>
#include <oneapi/mkl.hpp>
#include <iostream>
using namespace sycl;
int main() {
queue q{gpu_selector_v};
std::cout << "Device: " << q.get_device().get_info<info::device::name>() << std::endl;
const int M = 1024;
const int N = 11008;
const int K = 4096;
float* A_h = new float[M * K];
float* B_h = new float[K * N];
float* C_h = new float[M * N];
// random
for (int i = 0; i < M * K; ++i) {
A_h[i] = static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
}
for (int i = 0; i < K * N; ++i) {
B_h[i] = static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
}
// convert input to half
sycl::half* A_h_half = new sycl::half[M * K];
sycl::half* B_h_half = new sycl::half[K * N];
for (int i = 0; i < M * K; ++i) {
A_h_half[i] = sycl::half(A_h[i]);
}
for (int i = 0; i < K * N; ++i) {
B_h_half[i] = sycl::half(B_h[i]);
}
buffer<sycl::half> A(A_h_half, M * K);
buffer<sycl::half> B(B_h_half, K * N);
buffer<float> C(C_h, M * N);
// Use OneMKL to do GEMM
{
q.submit([&](handler &h) {
sycl::accessor A_acc(A, h, sycl::write_only, sycl::no_init);
sycl::accessor B_acc(B, h, sycl::write_only, sycl::no_init);
sycl::accessor C_acc(C, h, sycl::write_only, sycl::no_init);
oneapi::mkl::blas::row_major::gemm(
q,
oneapi::mkl::transpose::nontrans,
oneapi::mkl::transpose::trans,
M, N, K,
1.0f, A_acc.get_pointer(), K,
B_acc.get_pointer(), K,
0.0f, C_acc.get_pointer(), N);
}).wait();
}
delete[] A_h;
delete[] B_h;
delete[] C_h;
delete[] A_h_half;
delete[] B_h_half;
printf("run success!\n");
return 0;
}
running above script with below command:
# for linux
source /opt/intel/oneapi/setvars.sh
icpx -std=c++17 -fsycl -fopenmp -lpthread -l mkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lmkl_sycl_blas -lmkl_intel_ilp64 -lmkl_tbb_thread -o gemm_fp16 gemm_fp16.cpp
# for windows
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" --force
icx -std=c++17 -fsycl -fopenmp mkl_sycl_blas_dll.lib mkl_intel_ilp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib -o gemm_fp16 gemm_fp16.cpp
Observed behavior
If I run above command on Linux Arc A770, it works fine:
If I run above command on Windows MTL iGPU, it fails and even cause a black screen:
Expected behavior
I hope above FP16 GEMM can work for MTL iGPU. Thanks!