cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

[BUG] CUTLASS python emit utility for sm90 GEMM produces code with incorrect syntax/missing header files

Open jayhshah opened this issue 10 months ago • 3 comments

Describe the bug The sm90 gemm code produced by the cutlass.emit.pytorch utility has incorrect syntax and missing header files.

Steps/Code to reproduce bug Install the CUTLASS python interface via pip install nvidia-cutlass or building from source, and then run the following on a machine with sm90 compute capability:

import cutlass
import torch
dtype = torch.float16
plan = cutlass.op.Gemm(element=dtype, layout=cutlass.LayoutType.RowMajor)
op = plan.construct()
cutlass.emit.pytorch(op, name='cutlass_gemm', cc=plan.cc, sourcedir='out', jit=False)

The produced code then has at least three problems that prevent TORCH_CUDA_ARCH_LIST="9.0" python setup.py install from working:

  1. Missing header files.
  2. Incorrect/outdated syntax for DeviceKernel::Arguments arguments.
  3. For the ext_modules argument in setup.py, I needed to add libraries=['cuda'] as well as --generate-code=arch=compute_90a,code=[sm_90a] as an nvcc flag in extra_compile_args.

Copying over the header files/syntax from one of the hopper gemm examples fixed problems 1 and 2 for me.

Once these three changes are made, the example works as expected.

Expected behavior The cutlass.emit.pytorch utility for gemm with cc=90 produces code that builds correctly, whether by incorporating the fixes I outlined or otherwise.

Environment details: Tested with CUDA 12.3 and CUTLASS 3.4.1/3.5 on my bare-metal local machine with H100 PCIe GPU. OS is Ubuntu 20.04.6 LTS.

Additional details All sm80 gemm examples I tried work perfectly for me.

jayhshah avatar Mar 25 '24 16:03 jayhshah

Thanks for the detailed bug report! CC @jackkosaian

thakkarV avatar Mar 25 '24 16:03 thakkarV

Thanks! I thought this was being tracked in our CI, but it turns out that the unit tests related to PyTorch extension emission all involved emitting CUTLASS 2.x kernels.

I'm testing out these fixes internally. At latest, we'll push out a fix when we tag 3.5.

jackkosaian avatar Mar 25 '24 19:03 jackkosaian

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar May 03 '24 19:05 github-actions[bot]

closed by https://github.com/NVIDIA/cutlass/pull/1623

thakkarV avatar Jul 10 '24 19:07 thakkarV