[ROCm OpenCL] device::get_info<device::sub_group_sizes> throws Native API failed
Describe the bug
With ROCm 4.5.2, trying to call device.get_info<sycl::info::device::sub_group_sizes>() on an AMD device throws cl::sycl::runtime_error.
To Reproduce
#include <CL/sycl.hpp>
#include <iostream>
int main() {
std::vector<sycl::device> devices = sycl::device::get_devices();
for (const auto &dev : devices) {
std::cout << "Device " << dev.get_info<sycl::info::device::name>()
<< std::endl;
const auto sg_sizes = dev.get_info<sycl::info::device::sub_group_sizes>();
}
}
$ clang++ -g -fsycl sgsize.cpp -o sgsizes && SYCL_DEVICE_FILTER=opencl:gpu ./sgsizes
Device gfx906:sramecc+:xnack-
terminate called after throwing an instance of 'cl::sycl::runtime_error'
what(): Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)
Aborted
Works fine with SYCL_DEVICE_FILTER=hip:gpu.
Looking at sycl/plugins/opencl/pi_opencl.cpp, it seems CL_DEVICE_SUB_GROUP_SIZES_INTEL is passed to clGetDeviceInfo, which, being non-Intel, does not know what to do with it.
It is understandable that the OpenCL runtime does not support this extension. But it would be nice if it was handled more gracefully, not with a cryptic Native API failed. Native API returns: -30 (CL_INVALID_VALUE).
Better yet, of course, is to provide the supported sub-group sizes.
Environment (please complete the following information):
- OS: Linux
- Target device and vendor: AMD MI50 (gfx906)
- DPC++ version: 16e1920d6cb7793c5d19ac3361677031fbafad08
- Dependencies version: ROCm 4.5.2
Is the value of 'sub_group_size' 64 for AMD devices ?
@zjin-lcf AMD RDNA architecture also supports 32-wide execution.
EDIT: But I specifically am more concerned with the function not throwing rather than querying the support for Wave32.
Do I understand correctly, that notion of warps in HIP is the same as in CUDA and it matches the subgroups notion in SYCL?
Then, I reckon this info should use hipDeviceAttributeWarpSize, shouldn't it?
The code is in place for quite a time: https://github.com/intel/llvm/blob/sycl/sycl/plugins/hip/pi_hip.cpp#L1085
@al42and , could you, please, clarify if you encounter exception throw with AMD GPU and HIP backend or OpenCL backend?
@malixian, could you, please, comment here?
Do I understand correctly, that notion of warps in HIP is the same as in CUDA and it matches the subgroups notion in SYCL?
It's called "wavefronts" in AMD lingo, but yes, it's the same thing.
Then, I reckon this info should use hipDeviceAttributeWarpSize, shouldn't it?
Almost. With HIP backend, it indeed should use, and in fact uses, hipDeviceAttributeWarpSize, which returns 64 as expected. But here, the problem is with the OpenCL backend which is happy to use AMD ICD when available:
[opencl:gpu:0] AMD Accelerated Parallel Processing, gfx906:sramecc+:xnack- 2.0 [3361.0 (HSA1.1,LC)]
[opencl:gpu:1] AMD Accelerated Parallel Processing, gfx906:sramecc+:xnack- 2.0 [3361.0 (HSA1.1,LC)]
[ext_oneapi_hip:gpu:0] AMD HIP BACKEND, gfx906:sramecc+:xnack- 0.0 [HIP 40421.43]
[ext_oneapi_hip:gpu:1] AMD HIP BACKEND, gfx906:sramecc+:xnack- 0.0 [HIP 40421.43]
So, the subgroup size detection works fine for ext_oneapi_hip:gpu devices, but crashes for opencl:gpu devices, because AMD's clGetDeviceInfo doesn't support the CL_DEVICE_SUB_GROUP_SIZES_INTEL property.
@al42and , could you, please, clarify if you encounter exception throw with AMD GPU and HIP backend or OpenCL backend?
@s-kanaev, it is with OpenCL backend, as mentioned in the Issue title :)
Sorry, my bad, I overlooked it.
It's not clear how to return sub group size when backend implementation doesn't support it hence, I believe some other exception wording might suite here.
I believe some other exception wording might suite here.
That would already be a big improvement. The current error is generic to the point of uselessness.
But AMD offers CL_DEVICE_WAVEFRONT_WIDTH_AMD property, which should work and would be a better solution.
It's not clear how to return sub group size when backend implementation doesn't support it hence, I believe some other exception wording might suite here.
I don't think that would be OK. The spec doesn't allow for the sub_group_sizes info descriptor to throw.
One possible solution would be to hide any OpenCL devices which don't support the Intel subgroup extension so they can't be queried.
The spec doesn't allow for the sub_group_sizes info descriptor to throw.
Are you sure? The spec says, "All member functions of the device class are synchronous and errors are handled by throwing synchronous SYCL exceptions." Some functions have "must throw X exception if Y" or "throws X if Y" wording, and some are marked noexcept, but I don't see anything saying that get_info<sub_group_sizes> is forbidden to throw. It encounters an error, it handles it according to the spec.
One possible solution would be to hide any OpenCL devices which don't support the Intel subgroup extension so they can't be queried.
Upon further investigation, I agree that it is the way to go, but with a slightly different rationale:
- AMD OpenCL GPUs support
cl_khr_subgroupsextension, according toclinfo. So it's a bit strange to claim to hide them because they don't support subgroups (BTW, not an extension anymore, part of core SYCL2020). - However, a bigger problem might be that, as far as I can tell, IntelLLVM does not support compiling kernels for AMD OpenCL GPUs.
- IntelLLVM is already hiding NVIDIA OpenCL GPUs, so if my understanding of the lack of kernel support is correct, it makes total sense to hide the devices for that reason.
BTW, if the spec doesn't allow for the sub_group_sizes info descriptor to throw, then we have a similar defect with the host device (but with a more descriptive sycl::exception::what()):
Device SYCL host device
Sub-group feature is not supported on HOST device. -33 (PI_ERROR_INVALID_DEVICE)