unified-runtime icon indicating copy to clipboard operation
unified-runtime copied to clipboard

[CUDA] Misleading error handling for hasExceededMaxRegistersPerBlock

Open rafbiels opened this issue 1 year ago • 0 comments

Coming from #1299 which originally included a change of the error code, but upon further discussion with @GeorgeWeb we agreed the error handling improvement should be a separate PR, paired with correct handling of the changed error code in intel/llvm.

The issue is that when a user submits a kernel that has a reasonable work group size, but exceeds the available number of registers on the device, the setKernelParams function in source/adapters/cuda/enqueue.cpp returns UR_RESULT_ERROR_INVALID_WORK_GROUP_SIZE which faces user with the following error:

terminate called after throwing an instance of 'sycl::_V1::nd_range_error'
  what():  Non-uniform work-groups are not supported by the target device -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)

even if they submit perfectly uniform work groups. This came up initially in https://github.com/intel/llvm/issues/12363 where the global and local sizes were 2048x788, 1024x1. The error is simply wrong in this case - an "out of resources" error should be reported instead.

Side note / another related but separate issue: It also seems the intel/llvm message for UR_RESULT_ERROR_INVALID_WORK_GROUP_SIZE is misleading in any case, because "invalid work group size" does not always mean non-uniform work-groups, it could be e.g. exceeding the max work group size.

rafbiels avatar Feb 01 '24 14:02 rafbiels