nerfstudio icon indicating copy to clipboard operation
nerfstudio copied to clipboard

Could not find compatible tinycudann extension for compute capability 80

Open ryansburgoyne opened this issue 2 years ago • 2 comments

I'm trying to create a Kubernetes pod with the nerfstudio container image on CoreWeave using this spec:

apiVersion: v1
kind: Pod
metadata:
  name: radiant
spec:
  containers:
  - name: nerfstudio
    image: dromni/nerfstudio:0.1.14
    resources:
      limits:
        cpu: 4
        memory: 16Gi
        nvidia.com/gpu: 1
        
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: gpu.nvidia.com/class
            operator: In
            values:
              - A100_NVLINK

However, the install script fails with this log:

==========
== CUDA ==
==========

CUDA Version 11.7.1

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

[20:32:14] 🤷 .zshrc not found, skipping.                                                                 install.py:210
           🔍 Found .bashrc!                                                                              install.py:212
[20:32:15] ✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-install-cli!      install.py:117
           ✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-dev-test!         install.py:117
           ✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-process-data!     install.py:117
[20:32:17] ✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-download-data!    install.py:117
[20:32:19] ✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-render!           install.py:117
           ✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-eval!             install.py:117
           ❌ Completion script generation failed: ['ns-train', '--tyro-print-completion', 'bash']        install.py:107
           Traceback (most recent call last):                                                             install.py:111
             File "/home/user/.local/bin/ns-train", line 5, in <module>                                                 
               from scripts.train import entrypoint                                                                     
             File "/home/user/nerfstudio/scripts/train.py", line 50, in <module>                                        
               from nerfstudio.configs.method_configs import AnnotatedBaseConfigUnion                                   
             File "/home/user/nerfstudio/nerfstudio/configs/method_configs.py", line 46, in <module>                    
               from nerfstudio.field_components.temporal_distortions import TemporalDistortionKind                      
             File "/home/user/nerfstudio/nerfstudio/field_components/__init__.py", line 17, in <module>                 
               from .encodings import Encoding, ScalingAndOffset                                                        
             File "/home/user/nerfstudio/nerfstudio/field_components/encodings.py", line 34, in <module>                
               import tinycudann as tcnn                                                                                
             File "/home/user/.local/lib/python3.10/site-packages/tinycudann/__init__.py", line 9, in                   
           <module>                                                                                                     
               from tinycudann.modules import free_temporary_memory, NetworkWithInputEncoding, Network,                 
           Encoding                                                                                                     
             File "/home/user/.local/lib/python3.10/site-packages/tinycudann/modules.py", line 35, in                   
           <module>                                                                                                     
               raise EnvironmentError(f"Could not find compatible tinycudann extension for compute                      
           capability {system_compute_capability}.")                                                                    
           OSError: Could not find compatible tinycudann extension for compute capability 80.                           
                                                                                                                        
Traceback (most recent call last):
  File "/home/user/.local/bin/ns-install-cli", line 8, in <module>
    sys.exit(entrypoint())
  File "/home/user/nerfstudio/scripts/completions/install.py", line 282, in entrypoint
    tyro.cli(main, description=__doc__)
  File "/home/user/.local/lib/python3.10/site-packages/tyro/_cli.py", line 127, in cli
    _cli_impl(
  File "/home/user/.local/lib/python3.10/site-packages/tyro/_cli.py", line 328, in _cli_impl
    out, consumed_keywords = _calling.call_from_args(
  File "/home/user/.local/lib/python3.10/site-packages/tyro/_calling.py", line 194, in call_from_args
    return unwrapped_f(*args, **kwargs), consumed_keywords  # type: ignore
  File "/home/user/nerfstudio/scripts/completions/install.py", line 251, in main
    completion_paths = list(
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/nerfstudio/scripts/completions/install.py", line 253, in <lambda>
    lambda path_or_entrypoint_and_shell: _generate_completion(
  File "/home/user/nerfstudio/scripts/completions/install.py", line 112, in _generate_completion
    raise e
  File "/home/user/nerfstudio/scripts/completions/install.py", line 99, in _generate_completion
    new = subprocess.run(
  File "/usr/lib/python3.10/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ns-train', '--tyro-print-completion', 'bash']' returned non-zero exit status 1.

Is anyone able to point me towards fixing this error?

ryansburgoyne avatar Dec 30 '22 20:12 ryansburgoyne

I am getting the exact same error but in Ubuntu.

gtoos avatar Dec 31 '22 20:12 gtoos

Upgrading torch solved this issue for me with compatibility version 75. I think it is caused by an issue with the torch version in the repo not matching the version of the CUDA driver that tiny cuda wants to use.

pip3 install --upgrade torch torchvision torchaudio

If this does not work, I think best is to debug by verifying if the example samples/mlp_learning_an_image_pytorch.py in the tinycuda repo is able to run (https://github.com/NVlabs/tiny-cuda-nn), and following their installation steps.

Linkerbrain avatar Jan 02 '23 16:01 Linkerbrain