[BUG]: `cuda.tile._exception.TileCompilerExecutionError: Return code 5`
Version
1.0.0
Version
13.0 Docker image followed by apt update && apt upgrade to install 13.1 libraries
Which installation method(s) does this occur on?
Pip
Describe the bug.
I expected the script to run successfully but instead saw
cuda.tile._exception.TileCompilerExecutionError: Return code 5
failed to compile Tile IR program
Unknown location
I installed tileiras via apt install cuda-tileiras-13-1.
Minimum reproducible example
import cuda.tile as ct
import cupy
TILE_SIZE = 16
@ct.kernel
def vector_add_kernel(a, b, result):
block_id = ct.bid(0)
a_tile = ct.load(a, index=(block_id,), shape=(TILE_SIZE,))
b_tile = ct.load(b, index=(block_id,), shape=(TILE_SIZE,))
result_tile = a_tile + b_tile
ct.store(result, index=(block_id,), tile=result_tile)
def vector_add(a, b, result):
assert a.shape == b.shape == result.shape
grid = (ct.cdiv(a.shape[0], TILE_SIZE), 1, 1)
ct.launch(cupy.cuda.get_current_stream(), grid, vector_add_kernel, (a, b, result))
a = cupy.ones(32)
b = cupy.ones(32)
c = cupy.zeros_like(b)
print(c)
vector_add(a, b, c)
print(c)
Relevant log output
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/cuda/tile/_compile.py", line 362, in compile_cubin
subprocess.run(command + flags, env=env, check=True, capture_output=True,
File "/usr/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/local/cuda-13.1/bin/tileiras', '/tmp/tmp9x992294/vector_add_kernel4fdeil0v.bytecode', '-o', '/tmp/tmp9x992294/vector_add_kernel4fdeil0v.cubin', '--gpu-name', 'sm_100', '-O3', '--lineinfo']' returned non-zero exit status 5.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/cutile.py", line 25, in <module>
vector_add(a, b, c)
File "/root/cutile.py", line 18, in vector_add
ct.launch(cupy.cuda.get_current_stream(), grid, vector_add_kernel, (a, b, result))
File "/usr/local/lib/python3.12/dist-packages/cuda/tile/_compile.py", line 223, in __call__
lib = compile_tile(self.pyfunc, pyfunc_args, self.compiler_options, tile_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/cuda/tile/_compile.py", line 70, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/cuda/tile/_compile.py", line 211, in compile_tile
raise e
File "/usr/local/lib/python3.12/dist-packages/cuda/tile/_compile.py", line 204, in compile_tile
cubin_file = compile_cubin(f.name, compiler_options, sm_arch,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/cuda/tile/_compile.py", line 365, in compile_cubin
raise TileCompilerExecutionError(e.returncode, e.stderr.decode(), ' '.join(flags),
cuda.tile._exception.TileCompilerExecutionError: Return code 5
failed to compile Tile IR program
Unknown location
nvidia-smi output:
Fri Dec 5 21:33:21 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA B200 On | 00000000:52:00.0 Off | 0 |
| N/A 33C P0 138W / 1000W | 0MiB / 183359MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Full env printout
Not sure what print_env.sh is, the output of env is as follows:
NPP_VERSION=13.0.1.2
SHELL=/bin/bash
NVIDIA_VISIBLE_DEVICES=all
DALI_URL_SUFFIX=130
DALI_BUILD=
CUSOLVER_VERSION=12.0.4.66
NVJITLINK_VERSION=13.0.88
CUBLAS_VERSION=13.0.2.14
NVFATBIN_VERSION=13.0.85
HOSTNAME=umbriel-b200-094
CUFILE_VERSION=1.15.1.6
NVIDIA_REQUIRE_CUDA=cuda>=9.0
CUFFT_VERSION=12.0.0.61
SSH_AUTH_SOCK=/tmp/ssh-auth-sock
NVVM_VERSION=13.0.88
DOCA_VERSION=3.1.0
INSIDE_EMACS=vterm,tramp:2.8.1-pre
EMACS_VTERM_PATH=/Users/aportnoy/.emacs.d/elpa/vterm-20250929.1514/
NCCL_VERSION=2.27.7
MODEL_OPT_VERSION=0.33.0
EDITOR=vim
CUSPARSE_VERSION=12.6.3.3
ENV=''
NVSHMEM_VERSION=3.4.5
PWD=/root
OPENUCX_VERSION=1.19.0
NSIGHT_SYSTEMS_VERSION=2025.5.1.121
NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
JAX_TOOLBOX_REF=eb20147012072daeb09b0171acc24c1b9de46c71
POLYGRAPHY_VERSION=0.49.26
BUILD_DATE=2025-12-05
PIP_BREAK_SYSTEM_PACKAGES=1
CUDA_ARCH_LIST=7.5 8.0 8.6 9.0 10.0 12.0
CUSPARSELT_VERSION=0.8.1.1
TRT_VERSION=10.13.3.9
CUDA_BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base:25.09-cuda13.0-devel-ubuntu24.04
NVIDIA_PRODUCT_NAME=CUDA
RDMACORE_VERSION=56.0
SRC_PATH_TRANSFORMER_ENGINE=/opt/transformer-engine
HOME=/root
HISTFILE=/root/.tramp_history
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.avif=01;35:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:*~=00;90:*#=00;90:*.bak=00;90:*.crdownload=00;90:*.dpkg-dist=00;90:*.dpkg-new=00;90:*.dpkg-old=00;90:*.dpkg-tmp=00;90:*.old=00;90:*.orig=00;90:*.part=00;90:*.rej=00;90:*.rpmnew=00;90:*.rpmorig=00;90:*.rpmsave=00;90:*.swp=00;90:*.tmp=00;90:*.ucf-dist=00;90:*.ucf-new=00;90:*.ucf-old=00;90:
CUDA_VERSION=13.0.1.012
CURAND_VERSION=10.4.0.35
PROMPT_COMMAND=
CUBLASMP_VERSION=0.5.1.65
HPCX_VERSION=2.24.1
CUDNN_FRONTEND_VERSION=1.14.0
LESSCLOSE=/usr/bin/lesspipe %s %s
TERM=xterm-256color
GDRCOPY_VERSION=2.5
LESSOPEN=| /usr/bin/lesspipe %s
NVRX_VERSION=0.4.1+cuda13
OPENMPI_VERSION=4.1.7
NVPTXCOMPILER_VERSION=13.0.88
NVJPEG_VERSION=13.0.1.86
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
AWS_OFI_NCCL_VERSION=1.14.0
XLA_FLAGS= --xla_gpu_enable_latency_hiding_scheduler=true
SHLVL=2
BASH_ENV=/etc/bash.bashrc
PAGER=less
CUDNN_VERSION=9.13.0.50
MAXSMVER=
EFA_VERSION=1.38.1
NSIGHT_COMPUTE_VERSION=2025.3.1.4
DALI_VERSION=1.51.2
LD_LIBRARY_PATH=/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
LC_CTYPE=''
PS3=
OMPI_MCA_coll_hcoll_enable=0
OPAL_PREFIX=/opt/hpcx/ompi
MANIFEST_FILE=/opt/manifest.d/manifest.yaml
CUDA_DRIVER_VERSION=580.82.07
LC_ALL=C.utf8
TRANSFORMER_ENGINE_VERSION=2.7
TMOUT=0
_CUDA_COMPAT_PATH=/usr/local/cuda/compat
NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS=
PATH=/usr/local/cuda/bin:/usr/local/mpi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/amazon/efa/bin:/opt/mellanox/doca/tools/
CUDLA_VERSION=13.0.1.012
MOFED_VERSION=5.4-rdmacore56.0
TRTOSS_VERSION=
OLDPWD=/root/cupy
_=/usr/bin/env
Other/Misc.
No response
Contributing Guidelines
- [x] I agree to follow cuTile Python's contributing guidelines
- [x] I have searched the open bugs and have found no duplicates for this bug report
hi @andportnoy 13.0 docker image only has 13.0 toolkits. cuTile requires tileiras, ptxas, and libnvvm from 13.1 toolkit., so only installing cuda-tileiras-13.1 is not sufficient.
The most robust way is to use a fresh linux image and install 13.1 toolkit.
If you don't want to install the full toolkit, at least you will need cuda-tileiras-13.1 and cuda-compiler-13.1.
Hi @haijieg , I encountered the same error message when compiling the example cuTile code mentioned above.
I used a NVIDIA GeForce RTX 5060 Ti GPU, with CUDA driver of version 591.44 and CUDA toolkits 13.1 installed on a fresh WSL2 Ubuntu 24.04 environment. The cutile package was installed via pip install cuda-tile.
Also, I tried to run the test in a fresh Ubuntu 24.04 docker inside my WSL2 (with CUDA toolkits 13.1 manually installed). It ended up with the same error.
PS. I'm not sure if this should go to another new issue. If yes please tell me. I'm happy to provide more env info.
@xujh333 please make sure in a WSL environment, your driver comes from Windows and not installed within Ubuntu. If you are still getting the same error, please file a separate issue. In your WSL2 environment with CTK 13.1 installed, please share the output of the following command:
ls -al /usr/local/cuda/bin/
ls -al /usr/local/cuda/nvvm/lib64
You want to see "tileiras" and "ptxas" symlinked from /usr/local/cuda-13.1/bin/ and libnvvm.so.4.0.0 symlinked from /usr/local/cuda-13.1/nvvm/lib64
Installing tileiras and the compiler libs as follows fixed the issue, thanks!
apt install cuda-tileiras-13-1 cuda-compiler-13-1