tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[CUDA][shared memory allocation]fix 'ptxas error : Entry function 'fu…

Open AIYoungcino opened this issue 1 year ago • 3 comments

I convert a vit model from onnx, and then run relay.build with NVIDAI-RTX4090 for compilation.

with tvm.transform.PassContext(opt_level=3): lib = relay.build(mod, target=target, params=params) and then meet an error like this: Compilation error: ptxas error : Entry function 'tvmgen_default_fused_nn_conv2d_add_kernel' uses too much shared data (0x2ab44 bytes, 0x29000 max)

I apologize for resorting to this temporary solution to address the issue I encountered. As a stepping stone, I hope the experts can offer some advice to help me resolve this problem more effectively. Thank you.

AIYoungcino avatar Aug 12 '24 02:08 AIYoungcino

I searched for information provided by NVIDIA, and the following are the maximum shared memory limits corresponding to each generation of GPU architecture, 5.x : 64kb 6.x : 64kb 7.x : 96kb 8.x : 164kb

AIYoungcino avatar Aug 12 '24 03:08 AIYoungcino

Dynamic shared memory (shared.dyn scope) should be used in this case to bypass the size limit

vinx13 avatar Aug 16 '24 00:08 vinx13

Dynamic shared memory (shared.dyn scope) should be used in this case to bypass the size limit

Thank you for your advice. if the size of the result of conv2d exceeds the maximum shared memory limit, storing it in shared memory would lead to overflow. It's typically passed as a parameter to the kernel function or allocated GDRAm space through static extern at compile time.

AIYoungcino avatar Aug 16 '24 02:08 AIYoungcino