cutlass
cutlass copied to clipboard
[BUG] CUTLASS python emit utility for sm90 GEMM produces code with incorrect syntax/missing header files
Describe the bug
The sm90 gemm code produced by the cutlass.emit.pytorch
utility has incorrect syntax and missing header files.
Steps/Code to reproduce bug
Install the CUTLASS python interface via pip install nvidia-cutlass
or building from source, and then run the following on a machine with sm90 compute capability:
import cutlass
import torch
dtype = torch.float16
plan = cutlass.op.Gemm(element=dtype, layout=cutlass.LayoutType.RowMajor)
op = plan.construct()
cutlass.emit.pytorch(op, name='cutlass_gemm', cc=plan.cc, sourcedir='out', jit=False)
The produced code then has at least three problems that prevent TORCH_CUDA_ARCH_LIST="9.0" python setup.py install
from working:
- Missing header files.
- Incorrect/outdated syntax for
DeviceKernel::Arguments arguments
. - For the
ext_modules
argument insetup.py
, I needed to addlibraries=['cuda']
as well as--generate-code=arch=compute_90a,code=[sm_90a]
as an nvcc flag inextra_compile_args
.
Copying over the header files/syntax from one of the hopper gemm examples fixed problems 1 and 2 for me.
Once these three changes are made, the example works as expected.
Expected behavior
The cutlass.emit.pytorch
utility for gemm with cc=90
produces code that builds correctly, whether by incorporating the fixes I outlined or otherwise.
Environment details: Tested with CUDA 12.3 and CUTLASS 3.4.1/3.5 on my bare-metal local machine with H100 PCIe GPU. OS is Ubuntu 20.04.6 LTS.
Additional details All sm80 gemm examples I tried work perfectly for me.
Thanks for the detailed bug report! CC @jackkosaian
Thanks! I thought this was being tracked in our CI, but it turns out that the unit tests related to PyTorch extension emission all involved emitting CUTLASS 2.x kernels.
I'm testing out these fixes internally. At latest, we'll push out a fix when we tag 3.5.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
closed by https://github.com/NVIDIA/cutlass/pull/1623