llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[ROCm OpenCL] device::get_info<device::sub_group_sizes> throws Native API failed

Open al42and opened this issue 3 years ago • 12 comments

Describe the bug

With ROCm 4.5.2, trying to call device.get_info<sycl::info::device::sub_group_sizes>() on an AMD device throws cl::sycl::runtime_error.

To Reproduce

#include <CL/sycl.hpp>
#include <iostream>

int main() {
  std::vector<sycl::device> devices = sycl::device::get_devices();

  for (const auto &dev : devices) {
    std::cout << "Device " << dev.get_info<sycl::info::device::name>()
              << std::endl;
    const auto sg_sizes = dev.get_info<sycl::info::device::sub_group_sizes>();
  }
}
$ clang++ -g -fsycl sgsize.cpp -o sgsizes && SYCL_DEVICE_FILTER=opencl:gpu ./sgsizes
Device gfx906:sramecc+:xnack-
terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)
Aborted

Works fine with SYCL_DEVICE_FILTER=hip:gpu.

Looking at sycl/plugins/opencl/pi_opencl.cpp, it seems CL_DEVICE_SUB_GROUP_SIZES_INTEL is passed to clGetDeviceInfo, which, being non-Intel, does not know what to do with it.

It is understandable that the OpenCL runtime does not support this extension. But it would be nice if it was handled more gracefully, not with a cryptic Native API failed. Native API returns: -30 (CL_INVALID_VALUE).

Better yet, of course, is to provide the supported sub-group sizes.

Environment (please complete the following information):

  • OS: Linux
  • Target device and vendor: AMD MI50 (gfx906)
  • DPC++ version: 16e1920d6cb7793c5d19ac3361677031fbafad08
  • Dependencies version: ROCm 4.5.2

al42and avatar Mar 16 '22 18:03 al42and

Is the value of 'sub_group_size' 64 for AMD devices ?

zjin-lcf avatar Mar 20 '22 03:03 zjin-lcf

@zjin-lcf AMD RDNA architecture also supports 32-wide execution.

EDIT: But I specifically am more concerned with the function not throwing rather than querying the support for Wave32.

al42and avatar Mar 20 '22 16:03 al42and

Do I understand correctly, that notion of warps in HIP is the same as in CUDA and it matches the subgroups notion in SYCL? Then, I reckon this info should use hipDeviceAttributeWarpSize, shouldn't it?

s-kanaev avatar Mar 21 '22 14:03 s-kanaev

The code is in place for quite a time: https://github.com/intel/llvm/blob/sycl/sycl/plugins/hip/pi_hip.cpp#L1085

@al42and , could you, please, clarify if you encounter exception throw with AMD GPU and HIP backend or OpenCL backend?

@malixian, could you, please, comment here?

s-kanaev avatar Mar 21 '22 14:03 s-kanaev

Do I understand correctly, that notion of warps in HIP is the same as in CUDA and it matches the subgroups notion in SYCL?

It's called "wavefronts" in AMD lingo, but yes, it's the same thing.

Then, I reckon this info should use hipDeviceAttributeWarpSize, shouldn't it?

Almost. With HIP backend, it indeed should use, and in fact uses, hipDeviceAttributeWarpSize, which returns 64 as expected. But here, the problem is with the OpenCL backend which is happy to use AMD ICD when available:

[opencl:gpu:0] AMD Accelerated Parallel Processing, gfx906:sramecc+:xnack- 2.0 [3361.0 (HSA1.1,LC)]
[opencl:gpu:1] AMD Accelerated Parallel Processing, gfx906:sramecc+:xnack- 2.0 [3361.0 (HSA1.1,LC)]
[ext_oneapi_hip:gpu:0] AMD HIP BACKEND, gfx906:sramecc+:xnack- 0.0 [HIP 40421.43]
[ext_oneapi_hip:gpu:1] AMD HIP BACKEND, gfx906:sramecc+:xnack- 0.0 [HIP 40421.43]

So, the subgroup size detection works fine for ext_oneapi_hip:gpu devices, but crashes for opencl:gpu devices, because AMD's clGetDeviceInfo doesn't support the CL_DEVICE_SUB_GROUP_SIZES_INTEL property.

al42and avatar Mar 21 '22 14:03 al42and

@al42and , could you, please, clarify if you encounter exception throw with AMD GPU and HIP backend or OpenCL backend?

@s-kanaev, it is with OpenCL backend, as mentioned in the Issue title :)

al42and avatar Mar 21 '22 14:03 al42and

Sorry, my bad, I overlooked it.

s-kanaev avatar Mar 21 '22 14:03 s-kanaev

It's not clear how to return sub group size when backend implementation doesn't support it hence, I believe some other exception wording might suite here.

s-kanaev avatar Mar 21 '22 14:03 s-kanaev

I believe some other exception wording might suite here.

That would already be a big improvement. The current error is generic to the point of uselessness.

But AMD offers CL_DEVICE_WAVEFRONT_WIDTH_AMD property, which should work and would be a better solution.

al42and avatar Mar 21 '22 14:03 al42and

It's not clear how to return sub group size when backend implementation doesn't support it hence, I believe some other exception wording might suite here.

I don't think that would be OK. The spec doesn't allow for the sub_group_sizes info descriptor to throw.

One possible solution would be to hide any OpenCL devices which don't support the Intel subgroup extension so they can't be queried.

rolandschulz avatar Sep 22 '22 03:09 rolandschulz

The spec doesn't allow for the sub_group_sizes info descriptor to throw.

Are you sure? The spec says, "All member functions of the device class are synchronous and errors are handled by throwing synchronous SYCL exceptions." Some functions have "must throw X exception if Y" or "throws X if Y" wording, and some are marked noexcept, but I don't see anything saying that get_info<sub_group_sizes> is forbidden to throw. It encounters an error, it handles it according to the spec.

One possible solution would be to hide any OpenCL devices which don't support the Intel subgroup extension so they can't be queried.

Upon further investigation, I agree that it is the way to go, but with a slightly different rationale:

  • AMD OpenCL GPUs support cl_khr_subgroups extension, according to clinfo. So it's a bit strange to claim to hide them because they don't support subgroups (BTW, not an extension anymore, part of core SYCL2020).
  • However, a bigger problem might be that, as far as I can tell, IntelLLVM does not support compiling kernels for AMD OpenCL GPUs.
  • IntelLLVM is already hiding NVIDIA OpenCL GPUs, so if my understanding of the lack of kernel support is correct, it makes total sense to hide the devices for that reason.

al42and avatar Sep 22 '22 11:09 al42and

BTW, if the spec doesn't allow for the sub_group_sizes info descriptor to throw, then we have a similar defect with the host device (but with a more descriptive sycl::exception::what()):

Device SYCL host device
Sub-group feature is not supported on HOST device. -33 (PI_ERROR_INVALID_DEVICE)

al42and avatar Sep 22 '22 12:09 al42and