cumm
cumm copied to clipboard
Does not work with CUDA from micromamba environment.
When installing CUDA and cumm in a micromamba environment, cumm fails at runtime.
Steps to reproduce:
# set up new env
micromamba create -n testenv -c conda-forge python=3.11
micromamba activate testenv
# install cuda
micromamba install cuda -c nvidia/label/cuda-12.1.1
# install spconv
pip install spconv-cu121
# run some code relying on cumm/spconv...
I guess this also might require to run things on "old" GPUs (Titan X in my case) to force compilation of non-prebuilt kernels.
Error / Details
Exerpt of the error
│ /home/luz/.local/opt/micromamba/envs/testenv/lib/python3.11/site-packages/cumm/tensorview/__init__.py:323 in │
│ __init__ │
│ │
│ 320 │ │ if name_exprs is None: │
│ 321 │ │ │ name_exprs = [] │
│ 322 │ │ if isinstance(code, str): │
│ ❱ 323 │ │ │ self._mod = _NVRTCModule(code, headers, opts, program_name, │
│ 324 │ │ │ │ │ │ │ │ │ name_exprs, cudadevrt_path) │
│ 325 │ │ else: │
│ 326 │ │ │ self._mod = _NVRTCModule(code, cudadevrt_path) │
│ │
│ ╭───────────────────────────────────────────────────── locals ─────────────────────────────────────────────────────╮ │
│ │ code = '#include <cumm/common/TensorViewNVRTC.h>\n#include <cumm/common/TensorViewNVRTCKe'+2175 │ │
│ │ cudadevrt_path = '' │ │
│ │ headers = { │ │
│ │ │ 'cumm/common/TensorViewNVRTC.h': '#pragma once\n#include <tensorview/core/all.h>\n#include │ │
│ │ <tensorview/core/arrayops'+230, │ │
│ │ │ 'cumm/common/TensorViewNVRTCKernel.h': '#pragma once\n#include │ │
│ │ <tensorview/cuda/device_ops.h>\n#include <tensorview/gemm/d'+215, │ │
│ │ │ 'cumm/gemm/layout/RowMajor.h': '#pragma once\n#include │ │
│ │ <cumm/common/TensorViewNVRTC.h>\nnamespace cumm {\nnamespace'+845, │ │
│ │ │ 'cumm/gemm/layout/ColumnMajor.h': '#pragma once\n#include │ │
│ │ <cumm/common/TensorViewNVRTC.h>\nnamespace cumm {\nnamespace'+857, │ │
│ │ │ 'cumm/common/GemmBasic.h': '#pragma once\n#include │ │
│ │ <tensorview/core/nvrtc/runtime_include.h>\n#include <tensor'+254, │ │
│ │ │ 'cumm/common/GemmBasicKernel.h': '#pragma once\n#include │ │
│ │ <tensorview/gemm/arch/memory.h>\n#include <tensorview/gemm/'+245, │ │
│ │ │ 'cumm/gemm/utils/GemmUtilsCPU.h': '#pragma once\n#include │ │
│ │ <cumm/common/TensorViewNVRTC.h>\nnamespace cumm {\nnamespace'+851, │ │
│ │ │ 'spconv/gemmutils/GemmUtils.h': '#pragma once\n#include │ │
│ │ <cumm/common/TensorViewNVRTC.h>\nnamespace spconv {\nnamespa'+985, │ │
│ │ │ 'spconv/inpitera/maskiter/PitchLinear.h': '#pragma once\n#include │ │
│ │ <cumm/common/TensorViewNVRTC.h>\nnamespace spconv {\nnamespa'+352, │ │
│ │ │ 'spconv/inpitera/maskiter/MaskTileIteratorParams.h': '#pragma once\nnamespace spconv │ │
│ │ {\nnamespace inpitera {\nnamespace maskiter {\nstruct'+613, │ │
│ │ │ ... +36 │ │
│ │ } │ │
│ │ name_exprs = ['spconv::gemm_kernel', 'spconv::nvrtc_kernel_cpu_out', '&spconv::params_raw'] │ │
│ │ name_to_meta = { │ │
│ │ │ 'spconv::gemm_kernel': NVRTCKernelMeta[name=gemm_kernel,ns=spconv,args=[]], │ │
│ │ │ 'spconv::nvrtc_kernel_cpu_out': │ │
│ │ NVRTCKernelMeta[name=nvrtc_kernel_cpu_out,ns=spconv,args=[NVRTCArgMeta(base_type=<NVRTCArgBase… │ │
│ │ 0>, valid=True, simple_type=-1, shape=[1], is_simple_ptr=False, is_scalar=True, count=None), │ │
│ │ NVRTCArgMeta(base_type=<NVRTCArgBaseType.Scalar: 0>, valid=True, simple_type=-1, shape=[1], │ │
│ │ is_simple_ptr=False, is_scalar=True, count=None)]] │ │
│ │ } │ │
│ │ opts = [ │ │
│ │ │ '-std=c++14', │ │
│ │ │ '--gpu-architecture=sm_61', │ │
│ │ │ '-DTV_ENABLE_HARDWARE_ACC', │ │
│ │ │ '-I', │ │
│ │ │ '/usr/local/cuda/include', │ │
│ │ │ '-I', │ │
│ │ │ '/home/luz/.local/opt/micromamba/envs/testenv/lib/python3.11/site-packages/cumm/i'+6 │ │
│ │ ] │ │
│ │ program_name = 'kernel.cu' │ │
│ │ self = <cumm.nvrtc.CummNVRTCModule object at 0x7fab1c39ec50> │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: /io/include/tensorview/cuda/nvrtc.h(96)
compileResult == NVRTC_SUCCESS assert faild. nvrtc compile failed. log:
/home/luz/.local/opt/micromamba/envs/testenv/lib/python3.11/site-packages/cumm/include/tensorview/core/nvrtc/limits.h(689): warning #61-D: integer operation result is out of range
return -INT_MIN - 1;
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/home/luz/.local/opt/micromamba/envs/testenv/lib/python3.11/site-packages/cumm/include/tensorview/core/nvrtc/limits.h(689): warning #61-D: integer operation result is out of range
return -INT_MIN - 1;
^
/home/luz/.local/opt/micromamba/envs/testenv/lib/python3.11/site-packages/cumm/include/tensorview/gemm/dtypes/float8.h(53): catastrophic error: cannot open source file "cuda_fp8.h"
#include <cuda_fp8.h>
^
1 catastrophic error detected in the compilation of "kernel.cu.cu".
Compilation terminated.
As you can see in the log above, it adds /usr/local/cuda/include to the include directories. nvcc is working and in the environment. which nvcc returns /home/luz/.local/opt/micromamba/envs/tracker/bin/nvcc.
What I believe goes wrong:
https://github.com/FindDefinition/cumm/blob/e6a95bd2f13e767ff91aa60742f361ebdd1c880b/cumm/common.py#L269-L277
- First off:
cummgets the correctnvccpath, so that is fine. - Then, it tries to locate the include and lib directories. Unfortunately. here micromamba might be different than conda: In the environment, I have
$env/lib/libcudart.so(this is fine) but no$env/targetsdirectory. Instead, includes are directly at$env/include, including$env/include/cuda.h. So I believe the code snippet above should be adapted to also includePath(nvcc_path).parent.parent / "include"as a valid option. - Right now, with the check failing, it seems to fall back to
/usr/local/cuda/include, which on my machine contains an older CUDA version that is incompatible.
Temporary Workaround
Create a symlink from $env/targets/x86_64-linux/include to $env/include.
mkdir -p $env/targets/x86_64-linux
cd $env/targets/x86_64-linux
ln -s ../../include .