HIP
HIP copied to clipboard
Request: implement hipOccupancyMaxPotentialBlockSize for AMD GPUs
Occupancy calculator API is an invaluable asset in CUDA. Unfortunately hipOccupancyMaxPotentialBlockSize is only exposed to Nvidia GPUs for the time being. It would be immensely helpful if it is implemented for AMD GPUs.
I second this request.
Created an internal ticket SWDEV-180694 to track it. It'd be highly desirable to have this API implemented so machine learning frameworks can properly schedule available GPU resources efficiently.
relevant code in TensorFlow:
https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/tensorflow/core/util/gpu_launch_config.h#L165
Without this function implemented in HIP the grid / block size selection on AMD hardware would always be sub-optimal.
I think this can be closed as of ROCm 2.7?
https://github.com/ROCm-Developer-Tools/HIP/blob/854768787ee9bbd6ed22b3e8fd0f139955a57e6a/src/hip_module.cpp#L1015
The HIP implementation is not comparable to the corresponding CUDA function, which takes a function so that the dynamic shared memory can be a function of the block size.

Cc: @nbeams
I would further clarify that we would like a HIP version for the driver API function cuOccupancyMaxPotentialBlockSize, which I believe corresponds to cudaOccupancyMaxPotentialBlockSizeVariableSMem in the runtime API.
I see this was left as a TODO in https://github.com/ROCm-Developer-Tools/HIP/pull/1943/files#diff-9ec4991aeca8528b60eaf6d00b089eecda171d49742e348561c957c5fa2000feR1328-R1342
@gargrahul Can you suggest a workaround?
Hello, I was wondering if this is still being worked on? It's been 2 years since last update here, and unless I have pretty bad user error, it's still not working (somehow breaking calls that occur before I even call it)