[HIP][CUDA] local size bytes
HIP_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES | The local memory usage of each thread by this function in bytes. HIP_FUNC_ATTRIBUTE_NUM_REGS | The number of registers used by each thread of this function.
When the local size bytes is the memory usage of each thread, should the following case be moved to "urKernelGetInfo" ?
case UR_KERNEL_GROUP_INFO_PRIVATE_MEM_SIZE: {
// OpenCL PRIVATE == CUDA LOCAL
int Bytes = 0;
UR_CHECK_ERROR(cuFuncGetAttribute(
&Bytes, CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES, hKernel->get()));
return ReturnValue(uint64_t(Bytes));
This is more of a specification change proposal, essentially to move UR_KERNEL_GROUP_INFO_PRIVATE_MEM_SIZE from ur_kernel_group_info_t to ur_kernel_info_t.
We couldn't change it just for HIP and CUDA.
We would need to check if this is possible for all targets, but in theory I think it makes sense this information shouldn't be affected by the work size, so it shouldn't need to be in the group info.