[BUG] Calling cutlass.cuda.initialize_cuda_context() silently kills python
Which component has the problem?
CuTe DSL
Bug Report
Describe the bug
Calling cutlass.cuda.initialize_cuda_context() makes console output dissapear (or maybe even stops python exec?)
Steps/Code to reproduce bug
print("start printing")
import logging
logging.basicConfig(level=logging.DEBUG, format="%(levelname)s:%(name)s:%(message)s")
import torch
import cutlass
props = torch.cuda.get_device_properties(torch.cuda.current_device())
print(props)
print("print before cuda context")
cutlass.cuda.initialize_cuda_context()
print("after cuda context")
with open("check_python_running.txt", "w") as f:
f.write("python is running\n")
The output is
start printing
DEBUG:cutlass._mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:cutlass._mlir._mlir_libs:Registering dialects from initializer <module 'cutlass._mlir._mlir_libs._site_initialize_0' from '/home/jmt/Projects/stalkeye/hdr/.venv/lib/python3.13/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/_mlir/_mlir_libs/_site_initialize_0.cpython-313-x86_64-linux-gnu.so'>
DEBUG:cutlass._mlir._mlir_libs:Loading all available dialects
_CudaDeviceProperties(name='NVIDIA GeForce RTX 3090', major=8, minor=6, total_memory=24152MB, multi_processor_count=82,OMITTED
print before cuda context
And no file was found, which implies python was killed after initializing cuda context.
Environment
(hdr) jmt@jmtcluster:~/Projects/stalkeye/hdr$ uv pip list | grep nvidia
nvidia-cublas-cu12 12.8.4.1
nvidia-cuda-cupti-cu12 12.8.90
nvidia-cuda-nvrtc-cu12 12.8.93
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12 9.10.2.21
nvidia-cufft-cu12 11.3.3.83
nvidia-cufile-cu12 1.13.1.3
nvidia-curand-cu12 10.3.9.90
nvidia-cusolver-cu12 11.7.3.90
nvidia-cusparse-cu12 12.5.8.93
nvidia-cusparselt-cu12 0.7.1
nvidia-cutlass-dsl 4.3.0
nvidia-nccl-cu12 2.27.5
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvshmem-cu12 3.3.20
nvidia-nvtx-cu12 12.8.90
Tue Nov 25 14:43:25 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:08:00.0 Off | N/A |
| 0% 43C P8 15W / 350W | 305MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2308 G /usr/lib/xorg/Xorg 251MiB |
| 0 N/A N/A 4206 G /usr/bin/gnome-shell 13MiB |
| 0 N/A N/A 4661 G ...irefox/7298/usr/lib/firefox/firefox 11MiB |
+-----------------------------------------------------------------------------------------+
Ubuntu 22.04
As a further reference, the issue was the drivers where too old for the version of cute dsl used there. Maybe initializing cuda context requires a nvidia-smi version check not to crash as described.
I think we can add some detection code inside initialize_context ?
I think we can add some detection code inside
initialize_context?
Sure, I would be happy to make a PR if you know where exactly find the mapping between cute and drivers version.