unified-runtime icon indicating copy to clipboard operation
unified-runtime copied to clipboard

[HIP][CUDA] local size bytes

Open jinz2014 opened this issue 10 months ago • 1 comments

HIP_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES | The local memory usage of each thread by this function in bytes. HIP_FUNC_ATTRIBUTE_NUM_REGS | The number of registers used by each thread of this function.

When the local size bytes is the memory usage of each thread, should the following case be moved to "urKernelGetInfo" ?

  case UR_KERNEL_GROUP_INFO_PRIVATE_MEM_SIZE: {
    // OpenCL PRIVATE == CUDA LOCAL
    int Bytes = 0;
    UR_CHECK_ERROR(cuFuncGetAttribute(
        &Bytes, CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES, hKernel->get()));
    return ReturnValue(uint64_t(Bytes));

jinz2014 avatar Jan 28 '25 22:01 jinz2014

This is more of a specification change proposal, essentially to move UR_KERNEL_GROUP_INFO_PRIVATE_MEM_SIZE from ur_kernel_group_info_t to ur_kernel_info_t.

We couldn't change it just for HIP and CUDA.

We would need to check if this is possible for all targets, but in theory I think it makes sense this information shouldn't be affected by the work size, so it shouldn't need to be in the group info.

npmiller avatar Feb 18 '25 16:02 npmiller