Does not work with CUDA from micromamba environment.

Open qzed opened this issue 9 months ago • 0 comments

When installing CUDA and cumm in a micromamba environment, cumm fails at runtime.

Steps to reproduce:

# set up new env
micromamba create -n testenv -c conda-forge python=3.11
micromamba activate testenv

# install cuda
micromamba install cuda -c nvidia/label/cuda-12.1.1

# install spconv
pip install spconv-cu121

# run some code relying on cumm/spconv...

I guess this also might require to run things on "old" GPUs (Titan X in my case) to force compilation of non-prebuilt kernels.

Error / Details

Exerpt of the error

│ /home/luz/.local/opt/micromamba/envs/testenv/lib/python3.11/site-packages/cumm/tensorview/__init__.py:323 in         │
│ __init__                                                                                                             │
│                                                                                                                      │
│   320 │   │   if name_exprs is None:                                                                                 │
│   321 │   │   │   name_exprs = []                                                                                    │
│   322 │   │   if isinstance(code, str):                                                                              │
│ ❱ 323 │   │   │   self._mod = _NVRTCModule(code, headers, opts, program_name,                                        │
│   324 │   │   │   │   │   │   │   │   │    name_exprs, cudadevrt_path)                                               │
│   325 │   │   else:                                                                                                  │
│   326 │   │   │   self._mod = _NVRTCModule(code, cudadevrt_path)                                                     │
│                                                                                                                      │
│ ╭───────────────────────────────────────────────────── locals ─────────────────────────────────────────────────────╮ │
│ │           code = '#include <cumm/common/TensorViewNVRTC.h>\n#include <cumm/common/TensorViewNVRTCKe'+2175        │ │
│ │ cudadevrt_path = ''                                                                                              │ │
│ │        headers = {                                                                                               │ │
│ │                  │   'cumm/common/TensorViewNVRTC.h': '#pragma once\n#include <tensorview/core/all.h>\n#include  │ │
│ │                  <tensorview/core/arrayops'+230,                                                                 │ │
│ │                  │   'cumm/common/TensorViewNVRTCKernel.h': '#pragma once\n#include                              │ │
│ │                  <tensorview/cuda/device_ops.h>\n#include <tensorview/gemm/d'+215,                               │ │
│ │                  │   'cumm/gemm/layout/RowMajor.h': '#pragma once\n#include                                      │ │
│ │                  <cumm/common/TensorViewNVRTC.h>\nnamespace cumm {\nnamespace'+845,                              │ │
│ │                  │   'cumm/gemm/layout/ColumnMajor.h': '#pragma once\n#include                                   │ │
│ │                  <cumm/common/TensorViewNVRTC.h>\nnamespace cumm {\nnamespace'+857,                              │ │
│ │                  │   'cumm/common/GemmBasic.h': '#pragma once\n#include                                          │ │
│ │                  <tensorview/core/nvrtc/runtime_include.h>\n#include <tensor'+254,                               │ │
│ │                  │   'cumm/common/GemmBasicKernel.h': '#pragma once\n#include                                    │ │
│ │                  <tensorview/gemm/arch/memory.h>\n#include <tensorview/gemm/'+245,                               │ │
│ │                  │   'cumm/gemm/utils/GemmUtilsCPU.h': '#pragma once\n#include                                   │ │
│ │                  <cumm/common/TensorViewNVRTC.h>\nnamespace cumm {\nnamespace'+851,                              │ │
│ │                  │   'spconv/gemmutils/GemmUtils.h': '#pragma once\n#include                                     │ │
│ │                  <cumm/common/TensorViewNVRTC.h>\nnamespace spconv {\nnamespa'+985,                              │ │
│ │                  │   'spconv/inpitera/maskiter/PitchLinear.h': '#pragma once\n#include                           │ │
│ │                  <cumm/common/TensorViewNVRTC.h>\nnamespace spconv {\nnamespa'+352,                              │ │
│ │                  │   'spconv/inpitera/maskiter/MaskTileIteratorParams.h': '#pragma once\nnamespace spconv        │ │
│ │                  {\nnamespace inpitera {\nnamespace maskiter {\nstruct'+613,                                     │ │
│ │                  │   ... +36                                                                                     │ │
│ │                  }                                                                                               │ │
│ │     name_exprs = ['spconv::gemm_kernel', 'spconv::nvrtc_kernel_cpu_out', '&spconv::params_raw']                  │ │
│ │   name_to_meta = {                                                                                               │ │
│ │                  │   'spconv::gemm_kernel': NVRTCKernelMeta[name=gemm_kernel,ns=spconv,args=[]],                 │ │
│ │                  │   'spconv::nvrtc_kernel_cpu_out':                                                             │ │
│ │                  NVRTCKernelMeta[name=nvrtc_kernel_cpu_out,ns=spconv,args=[NVRTCArgMeta(base_type=<NVRTCArgBase… │ │
│ │                  0>, valid=True, simple_type=-1, shape=[1], is_simple_ptr=False, is_scalar=True, count=None),    │ │
│ │                  NVRTCArgMeta(base_type=<NVRTCArgBaseType.Scalar: 0>, valid=True, simple_type=-1, shape=[1],     │ │
│ │                  is_simple_ptr=False, is_scalar=True, count=None)]]                                              │ │
│ │                  }                                                                                               │ │
│ │           opts = [                                                                                               │ │
│ │                  │   '-std=c++14',                                                                               │ │
│ │                  │   '--gpu-architecture=sm_61',                                                                 │ │
│ │                  │   '-DTV_ENABLE_HARDWARE_ACC',                                                                 │ │
│ │                  │   '-I',                                                                                       │ │
│ │                  │   '/usr/local/cuda/include',                                                                  │ │
│ │                  │   '-I',                                                                                       │ │
│ │                  │   '/home/luz/.local/opt/micromamba/envs/testenv/lib/python3.11/site-packages/cumm/i'+6        │ │
│ │                  ]                                                                                               │ │
│ │   program_name = 'kernel.cu'                                                                                     │ │
│ │           self = <cumm.nvrtc.CummNVRTCModule object at 0x7fab1c39ec50>                                           │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

RuntimeError: /io/include/tensorview/cuda/nvrtc.h(96)
compileResult == NVRTC_SUCCESS assert faild. nvrtc compile failed. log:
 /home/luz/.local/opt/micromamba/envs/testenv/lib/python3.11/site-packages/cumm/include/tensorview/core/nvrtc/limits.h(689): warning #61-D: integer operation result is out of range
      return -INT_MIN - 1;
             ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

/home/luz/.local/opt/micromamba/envs/testenv/lib/python3.11/site-packages/cumm/include/tensorview/core/nvrtc/limits.h(689): warning #61-D: integer operation result is out of range
      return -INT_MIN - 1;
                      ^

/home/luz/.local/opt/micromamba/envs/testenv/lib/python3.11/site-packages/cumm/include/tensorview/gemm/dtypes/float8.h(53): catastrophic error: cannot open source file "cuda_fp8.h"
  #include <cuda_fp8.h>
                       ^

1 catastrophic error detected in the compilation of "kernel.cu.cu".
Compilation terminated.

As you can see in the log above, it adds /usr/local/cuda/include to the include directories. nvcc is working and in the environment. which nvcc returns /home/luz/.local/opt/micromamba/envs/tracker/bin/nvcc.

What I believe goes wrong:

https://github.com/FindDefinition/cumm/blob/e6a95bd2f13e767ff91aa60742f361ebdd1c880b/cumm/common.py#L269-L277

First off: cumm gets the correct nvcc path, so that is fine.
Then, it tries to locate the include and lib directories. Unfortunately. here micromamba might be different than conda: In the environment, I have $env/lib/libcudart.so (this is fine) but no $env/targets directory. Instead, includes are directly at $env/include, including $env/include/cuda.h. So I believe the code snippet above should be adapted to also include Path(nvcc_path).parent.parent / "include" as a valid option.
Right now, with the check failing, it seems to fall back to /usr/local/cuda/include, which on my machine contains an older CUDA version that is incompatible.

Temporary Workaround

Create a symlink from $env/targets/x86_64-linux/include to $env/include.

mkdir -p $env/targets/x86_64-linux
cd $env/targets/x86_64-linux
ln -s ../../include .

Mar 13 '25 19:03 qzed