compute-runtime icon indicating copy to clipboard operation
compute-runtime copied to clipboard

CL_​OUT_​OF_​RESOURCES when compiling a SPIR-V with a bunch of small kernels

Open pjaaskel opened this issue 2 years ago • 5 comments

Is there a (relatively low) size limit for the built SPIR-V modules? I'm getting CL_OUT_OF_RESOURCES when trying to build (via the CHIP-SPV runtime) a unit test in rocPRIM which has a bunch of test kernels. Omitting some of the kernels makes the test pass (I can also enable the omitted ones in turn and they pass if I disable some of the others). This reproduces both via OpenCL and LevelZero.

In this case it's not a question of a large monolithic kernel that might fill up an instruction memory, but a dozen or so of smaller kernels which are launched separately, thus a lazy kernel binary deployment strategy at launch time should avoid an imem limit issue, if that's the case here.

The kernels use a bit of shared memory, but not much. Is there a way to dump more info of the reason for out of resources in the driver?

SPIR-Vs of the working and non-working cases: spvs.zip

pjaaskel avatar Nov 21 '22 09:11 pjaaskel

@pjaaskel could you share more details about neo driver version?

JablonskiMateusz avatar Nov 29 '22 12:11 JablonskiMateusz

Seems I have quite an old version (1.0.0). I've been under assumption that I'd get updates through apt package `intel-oneapi-runtime-opencl', but seems it's only the CPU driver? I'm still supposed to upgrade the GPU OpenCL driver via the github .debs? I'm confused. I'll try upgrading via debs to see if the latest version fixes it.

pjaaskel avatar Nov 29 '22 14:11 pjaaskel

please run clinfo and check Driver Version

JablonskiMateusz avatar Nov 29 '22 15:11 JablonskiMateusz

Seems I still get the same CL_OUT_OF_RESOURCES problem with Driver Version 22.43.24558. Works when I prune down the number of tests. Is there a way I can debug the actual reason (which resource it runs out) somehow?

pjaaskel avatar Nov 29 '22 15:11 pjaaskel