[CUDA] Max local mem size check should return OUT_OF_RESOURCES
Building on top of https://github.com/intel/llvm/pull/12604 + https://github.com/oneapi-src/unified-runtime/pull/1318 which adds handleOutOfResources to dpcpp and returns UR_RESULT_ERROR_OUT_OF_RESOURCES, the local mem size check:
https://github.com/oneapi-src/unified-runtime/blob/f086f369cab557bf2a589e22bfc37e18d7de5fa8/source/adapters/cuda/enqueue.cpp#L294-L298
should also return UR_RESULT_ERROR_OUT_OF_RESOURCES and have dedicated error handling case added in handleOutOfResources.
Right now submitting a kernel with too large local mem size results in:
Native API failed. Native API returns: -996 (The plugin has emitted a backend specific error)
Excessive allocation of local memory on the device
-996 (The plugin has emitted a backend specific error)
which does contain a helpful exception message, but wrapped in generic and confusing "backend specific error" messages and the unhelpful code -996. Having this returning ERROR_OUT_OF_RESOURCES would make it easier for us to cover in the troubleshooting guide, and for users to find it with web search engines.
@GeorgeWeb I've assigned this to you since its building on top of your PR's.