exo nvidia nvml destory the start in docker without nvidia gpu

I have no nvidia gpu and use docker to run exo

docker run ubuntu
git clone exo
apt install build-essential python3 python3-venv python3-pip libgl1-mesa-dev libglib2.0-0
source install.sh
report the error

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. Selected inference engine: None

/ _ \ / / _ \ | /> < (_) | _/_/____/

Detected system: Linux Inference engine name after selection: tinygrad Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader [58906] Chat interface started:

http://127.0.0.1:52415
http://172.17.0.2:52415 ChatGPT API endpoint served at:
http://127.0.0.1:52415/v1/chat/completions
http://172.17.0.2:52415/v1/chat/completions Traceback (most recent call last): File "/exo/.venv/lib/python3.12/site-packages/pynvml.py", line 2248, in _LoadNvmlLibrary nvmlLib = CDLL("libnvidia-ml.so.1") ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/ctypes/init.py", line 379, in init self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/exo/.venv/bin/exo", line 5, in from exo.main import run File "/exo/exo/main.py", line 131, in node = Node( ^^^^^ File "/exo/exo/orchestration/node.py", line 40, in init self.device_capabilities = device_capabilities() ^^^^^^^^^^^^^^^^^^^^^ File "/exo/exo/topology/device_capabilities.py", line 151, in device_capabilities return linux_device_capabilities() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/exo/exo/topology/device_capabilities.py", line 189, in linux_device_capabilities pynvml.nvmlInit() File "/exo/.venv/lib/python3.12/site-packages/pynvml.py", line 2220, in nvmlInit nvmlInitWithFlags(0) File "/exo/.venv/lib/python3.12/site-packages/pynvml.py", line 2203, in nvmlInitWithFlags _LoadNvmlLibrary() File "/exo/.venv/lib/python3.12/site-packages/pynvml.py", line 2250, in _LoadNvmlLibrary _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND) File "/exo/.venv/lib/python3.12/site-packages/pynvml.py", line 979, in _nvmlCheckReturn raise NVMLError(ret) pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found

Dec 14 '24 06:12 2jiangjiang

I test that tinygrad.Device.DEFAULT return value "GPU". When I delete "Device.DEFAULT == "GPU"" in nvidia case exo worked.I don't know if it can work properly with oneAPI(Intel GPU)

Dec 14 '24 07:12 2jiangjiang

I test that tinygrad.Device.DEFAULT return value "GPU". When I delete Device.DEFAULT == "NV" in nvidia case exo worked.I don't know if it can work properly with oneAPI(Intel GPU)

You'll need to install the prerequisites listed in the README:

For Linux with NVIDIA GPU support (Linux-only, skip if not using Linux or NVIDIA):

NVIDIA driver - verify with nvidia-smi
CUDA toolkit - install from NVIDIA CUDA guide, verify with nvcc --version
cuDNN library - download from NVIDIA cuDNN page, verify installation by following these steps

Dec 14 '24 20:12 AlexCheema

我测试了 tinygrad.Device.DEFAULT 返回值“GPU”。当我在 nvidia 情况下删除 Device.DEFAULT ==“NV”时，exo 起作用了。我不知道它是否可以与 oneAPI（Intel GPU）正常工作

您需要安装 README 中列出的先决条件：

对于支持 NVIDIA GPU 的 Linux（仅限 Linux，如果不使用 Linux 或 NVIDIA，请跳过）：

NVIDIA 驱动程序 - 使用 nvidia-smi 进行验证

CUDA 工具包 - 从NVIDIA CUDA 指南安装，使用 nvcc --version 进行验证

cuDNN 库 - 从NVIDIA cuDNN 页面下载，按照以下步骤验证安装

I not use NVIDIA GPU,but I use INTEL GPU but the case enter the incorrect NVIDIA case so it was a bug and need patch

Dec 16 '24 03:12 2jiangjiang

我测试了 tinygrad.Device.DEFAULT 返回值“GPU”。当我在 nvidia 情况下删除 Device.DEFAULT ==“NV”时，exo 起作用了。我不知道它是否可以与 oneAPI（Intel GPU）正常工作

It's my mistake I have delete "Device.DEFAULT=="GPU"" not "Device.DEFAULT=="NV""

Dec 16 '24 03:12 2jiangjiang

Ni Hao @2jiangjiang, if you want to go ahead and craft a line with the device name specs for your card, I can add it to the CHIP_FLOPS list for my Intel Arc Support PR...

Should look something like this: https://github.com/exo-explore/exo/pull/791/files#diff-cf2f88e490e7f1b3c6256e98545897497902d040113f29dafc5fc6054b6b2151R144

Mar 21 '25 05:03 deftdawg