cutlass [BUG] Launch conv kernel with offline ptx failed as invalid argument

I am trying to runtime load offline compiled ptx using the same CUDA source file and launch kernel using cuLaunchKernel, but examples/16_ampere_tensorop_conv2dfprop failed with driver error code 1.


> CUDA Driver error at .../cutlass/examples/16_ampere_tensorop_conv2dfprop/../../include/cutlass/conv/device/implicit_gemm_convolution.h:285 code=1(invalid argument) "cuLaunchKernel(kernel, grid.x, grid.y, grid.z, block.x, block.y, block.z, smem_size, stream, args, 0)"
> terminate called after throwing an instance of 'std::runtime_error'
>   what():  CUDA driver failed.

It's OK to use the same launch call if getting kernel with cudaGetFuncBySymbol. I also tried examples/00_basic_gemm, both methods run smoothly.

Environment: CUDA 11.6 on Ubuntu 20, RTX 3090 steps to reproduce:

Apply the attached patch to latest code
change LAUNCH_MODE to 0
cd examples/16_ampere_tensorop_conv2dfprop/
make
cp tmp/*.ptx kernel.ptx
change LAUNCH_MODE back to 2
make clean && make
./case It will show the error message as above.

patch_offline_ptx.log

Apr 27 '22 02:04 shenzhenghai

Sorry, I don't know. I checked your code and I cannot find anything obvious. Maybe you can change this https://github.com/NVIDIA/cutlass/blob/master/examples/16_ampere_tensorop_conv2dfprop/ampere_tensorop_conv2dfprop.cu#L167 to kAnalytic first. kOptimized uses complex params to compute things in the host before sending to the device. Maybe that part is wrong.

I know JIT cutlass conv is doable because many users user cutlass in this way.

Apr 27 '22 13:04 hwu36

Thanks for the quick reply. I tried kAnalytic, got same invalid argument error.

In fact, it can work using the same cuda driver API with the same parameters, just using alternative way (cudaGetFuncBySymbol) to retrieve kernel handle (LAUNCH_MODE=1 in the patch) It seems executable builtin fatbin might be different than offline compiled one, I also tried following method to get offline ptx/cubin/fatbin, none can work.

nvcc compile time saved cubin
nvcc compile time saved fatbin
cuobjdump ptx from the executable
cuobjdump sass from the executable

Probably it need NVIDIA driver or compiler team's help to root cause.

Apr 27 '22 14:04 shenzhenghai

@mnicely May I know any updates for the bug? Thanks.

May 04 '22 02:05 shenzhenghai

@mnicely @hwu36 any updates after 2 weeks? Thanks.

May 09 '22 01:05 shenzhenghai

Sorry, I haven't looked into your case again. I know nvrtc + conv can work since many users make it work.

We are working on enable cutlass conv with python. You may take a look when we enable it. See https://github.com/NVIDIA/cutlass/issues/474

May 09 '22 01:05 hwu36

@hwu36 I am enabling nvrtc using cutlass conv, encountered the issue. I suspected runtime compiled PTX might not be as expected , so I compared it with nvcc kept PTX, they were almost same. And both PTX got same launching error described in the beginning. Gemm case were OK using either of the PTX generated by the same flow.

Any example for nvrtc + cutlass conv? waiting for #474?

May 09 '22 11:05 shenzhenghai

waiting for https://github.com/NVIDIA/cutlass/issues/474?

yes if you still cannot figure out.

May 09 '22 16:05 hwu36

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Jun 08 '22 16:06 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Sep 06 '22 17:09 github-actions[bot]

pycutlass implemented in 2.10 supports to use nvrtc to build conv.

Sep 17 '22 02:09 hwu36