nerfstudio
nerfstudio copied to clipboard
Could not find compatible tinycudann extension for compute capability 80
I'm trying to create a Kubernetes pod with the nerfstudio container image on CoreWeave using this spec:
apiVersion: v1
kind: Pod
metadata:
name: radiant
spec:
containers:
- name: nerfstudio
image: dromni/nerfstudio:0.1.14
resources:
limits:
cpu: 4
memory: 16Gi
nvidia.com/gpu: 1
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu.nvidia.com/class
operator: In
values:
- A100_NVLINK
However, the install script fails with this log:
==========
== CUDA ==
==========
CUDA Version 11.7.1
Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
[20:32:14] 🤷 .zshrc not found, skipping. install.py:210
🔍 Found .bashrc! install.py:212
[20:32:15] ✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-install-cli! install.py:117
✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-dev-test! install.py:117
✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-process-data! install.py:117
[20:32:17] ✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-download-data! install.py:117
[20:32:19] ✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-render! install.py:117
✔ Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-eval! install.py:117
❌ Completion script generation failed: ['ns-train', '--tyro-print-completion', 'bash'] install.py:107
Traceback (most recent call last): install.py:111
File "/home/user/.local/bin/ns-train", line 5, in <module>
from scripts.train import entrypoint
File "/home/user/nerfstudio/scripts/train.py", line 50, in <module>
from nerfstudio.configs.method_configs import AnnotatedBaseConfigUnion
File "/home/user/nerfstudio/nerfstudio/configs/method_configs.py", line 46, in <module>
from nerfstudio.field_components.temporal_distortions import TemporalDistortionKind
File "/home/user/nerfstudio/nerfstudio/field_components/__init__.py", line 17, in <module>
from .encodings import Encoding, ScalingAndOffset
File "/home/user/nerfstudio/nerfstudio/field_components/encodings.py", line 34, in <module>
import tinycudann as tcnn
File "/home/user/.local/lib/python3.10/site-packages/tinycudann/__init__.py", line 9, in
<module>
from tinycudann.modules import free_temporary_memory, NetworkWithInputEncoding, Network,
Encoding
File "/home/user/.local/lib/python3.10/site-packages/tinycudann/modules.py", line 35, in
<module>
raise EnvironmentError(f"Could not find compatible tinycudann extension for compute
capability {system_compute_capability}.")
OSError: Could not find compatible tinycudann extension for compute capability 80.
Traceback (most recent call last):
File "/home/user/.local/bin/ns-install-cli", line 8, in <module>
sys.exit(entrypoint())
File "/home/user/nerfstudio/scripts/completions/install.py", line 282, in entrypoint
tyro.cli(main, description=__doc__)
File "/home/user/.local/lib/python3.10/site-packages/tyro/_cli.py", line 127, in cli
_cli_impl(
File "/home/user/.local/lib/python3.10/site-packages/tyro/_cli.py", line 328, in _cli_impl
out, consumed_keywords = _calling.call_from_args(
File "/home/user/.local/lib/python3.10/site-packages/tyro/_calling.py", line 194, in call_from_args
return unwrapped_f(*args, **kwargs), consumed_keywords # type: ignore
File "/home/user/nerfstudio/scripts/completions/install.py", line 251, in main
completion_paths = list(
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/user/nerfstudio/scripts/completions/install.py", line 253, in <lambda>
lambda path_or_entrypoint_and_shell: _generate_completion(
File "/home/user/nerfstudio/scripts/completions/install.py", line 112, in _generate_completion
raise e
File "/home/user/nerfstudio/scripts/completions/install.py", line 99, in _generate_completion
new = subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ns-train', '--tyro-print-completion', 'bash']' returned non-zero exit status 1.
Is anyone able to point me towards fixing this error?
I am getting the exact same error but in Ubuntu.
Upgrading torch solved this issue for me with compatibility version 75. I think it is caused by an issue with the torch version in the repo not matching the version of the CUDA driver that tiny cuda wants to use.
pip3 install --upgrade torch torchvision torchaudio
If this does not work, I think best is to debug by verifying if the example samples/mlp_learning_an_image_pytorch.py
in the tinycuda repo is able to run (https://github.com/NVlabs/tiny-cuda-nn), and following their installation steps.