HIP Request: implement hipOccupancyMaxPotentialBlockSize for AMD GPUs

Request: implement hipOccupancyMaxPotentialBlockSize for AMD GPUs

Open kingcrimsontianyu opened this issue 6 years ago • 10 comments

Occupancy calculator API is an invaluable asset in CUDA. Unfortunately hipOccupancyMaxPotentialBlockSize is only exposed to Nvidia GPUs for the time being. It would be immensely helpful if it is implemented for AMD GPUs.

Feb 21 '19 17:02 kingcrimsontianyu

I second this request.

Created an internal ticket SWDEV-180694 to track it. It'd be highly desirable to have this API implemented so machine learning frameworks can properly schedule available GPU resources efficiently.

Feb 21 '19 17:02 whchung

relevant code in TensorFlow:

https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/tensorflow/core/util/gpu_launch_config.h#L165

Without this function implemented in HIP the grid / block size selection on AMD hardware would always be sub-optimal.

Mar 15 '19 16:03 whchung

I think this can be closed as of ROCm 2.7?
https://github.com/ROCm-Developer-Tools/HIP/blob/854768787ee9bbd6ed22b3e8fd0f139955a57e6a/src/hip_module.cpp#L1015

Sep 20 '19 15:09 jeffdaily

The HIP implementation is not comparable to the corresponding CUDA function, which takes a function so that the dynamic shared memory can be a function of the block size.

Cc: @nbeams

Feb 13 '22 21:02 jedbrown

I would further clarify that we would like a HIP version for the driver API function cuOccupancyMaxPotentialBlockSize, which I believe corresponds to cudaOccupancyMaxPotentialBlockSizeVariableSMem in the runtime API.

Feb 14 '22 16:02 nbeams

I see this was left as a TODO in https://github.com/ROCm-Developer-Tools/HIP/pull/1943/files#diff-9ec4991aeca8528b60eaf6d00b089eecda171d49742e348561c957c5fa2000feR1328-R1342

@gargrahul Can you suggest a workaround?

Feb 14 '22 18:02 jedbrown

Hello, I was wondering if this is still being worked on? It's been 2 years since last update here, and unless I have pretty bad user error, it's still not working (somehow breaking calls that occur before I even call it)

Apr 21 '24 20:04 0x0015

HIP HIP copied to clipboard

Request: implement hipOccupancyMaxPotentialBlockSize for AMD GPUs

HIP
HIP copied to clipboard