cutlass [QST] Global variable inside conv2d kernel

[QST] Global variable inside conv2d kernel

Open IzanCatalan opened this issue 2 months ago • 22 comments

What is your question? Hello, good day. I am currently researching the Conv2dFprop kernel as I intend to modify its implementation in the library, specifically in the file https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/conv/kernel/implicit_gemm_convolution.h.

This is because, whether called from a C++ or Python program on the host, this .h file is the last in the execution hierarchy and directly implements the convolution operation to be run on the GPU (in my case, NVIDIA A100 and V100).

My question is: Can a global variable be implemented within this class? I intend to assign a specific number of elements to this global variable, called multiply_tensor, which would then multiply the convolution parameters from then and the following calls to the class while the host code is still running.

I aim for this variable to be stored in GPU memory, then initialized and processed in the GPU during the first call and reused in subsequent ones. I am unsure if a global variable is a solution or if a new kernel parameter would be better.

Is this feasible?

Dec 15 '24 23:12 IzanCatalan

cutlass cutlass copied to clipboard

[QST] Global variable inside conv2d kernel

cutlass
cutlass copied to clipboard