Support for Jetson?
Hello,
I tried to launch exo on different versions of Jetpack but couldnt get a working exo on Jetson Orin due to the device_capabilities cant be discovered. If there is a way to utilize Jetson with Exo, any guide would be very helpful.
Thanks,
Giray
I've never tried this but I don't see why it wouldn't work. Can you send the logs with DEBUG=6? What do you mean by "device_capabilities can't be discovered"?
Hi Alex, This error message is from Jetson Orin 16gb freshly installed Jetpack 5.1.4 with all components. ` ~/exo$ DEBUG=6 exo None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/ _ \ / / _ \ | /> < (_) | _/_/____/
Detected system: Linux Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader Trying to find available port port=62480 [] Using available port: 62480 Generated and stored new node ID: 7ce7ce87-6cbd-4009-9c84-2a13ca626afc Chat interface started:
- http://127.0.0.1:8000
- http://192.168.1.103:8000
- http://172.17.0.1:8000 ChatGPT API endpoint served at:
- http://127.0.0.1:8000/v1/chat/completions
- http://192.168.1.103:8000/v1/chat/completions
- http://172.17.0.1:8000/v1/chat/completions tinygrad Device.DEFAULT='CUDA' Traceback (most recent call last): File "/home/orin/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 1798, in _LoadNvmlLibrary nvmlLib = CDLL("libnvidia-ml.so.1") ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/ctypes/init.py", line 379, in init self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/orin/.venv/bin/exo", line 5, in
` The following error is from Jetpack 6.1:
(.venv) orin@orin-desktop:~/exo$ exo None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/ _ \ / / _ \ | /> < (_) | _/_/____/
Detected system: Linux Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader [] Chat interface started:
- http://172.17.0.1:8000
- http://127.0.0.1:8000
- http://192.168.1.103:8000 ChatGPT API endpoint served at:
- http://172.17.0.1:8000/v1/chat/completions
- http://127.0.0.1:8000/v1/chat/completions
- http://192.168.1.103:8000/v1/chat/completions
Traceback (most recent call last):
File "/home/orin/exo/.venv/bin/exo", line 5, in
from exo.main import run File "/home/orin/exo/exo/main.py", line 82, in node = StandardNode( ^^^^^^^^^^^^^ File "/home/orin/exo/exo/orchestration/standard_node.py", line 38, in init self.device_capabilities = device_capabilities() ^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/exo/topology/device_capabilities.py", line 139, in device_capabilities return linux_device_capabilities() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/exo/topology/device_capabilities.py", line 180, in linux_device_capabilities gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 2440, in nvmlDeviceGetMemoryInfo _nvmlCheckReturn(ret) File "/home/orin/exo/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn raise NVMLError(ret) pynvml.nvml.NVMLError_NotSupported: Not Supported โญโโโโโโโโโโโโโโโโโโโโโ Exo Cluster (0 nodes) โโโโโโโโโโโโโโโโโโโโโโโฎ โญโโโโโโโโโโโโโโโโโโโโโโโ Download Progress โโโโโโโโโโโโโโโโโโโโโโโโโฎ โ โ
Here is the nvidia-smi output for Jetpack 6.1:
@gyillikci Did you make any progress on the Jetson? I'm looking forward trying it on a bunch of Jetson Nano boards. Kindly let me know if you have made any progress. Thanks.
Hi,
unfortunately, I haven't tried it after the post. but it seems the repo is very active.
Thanks for the feedback.
Man I'm also trying to run it on a Jestson AGX Orin, looks like the best option would be to setup a "jetson-container", but no luck... I think the main issue is exo's requirement "Python>=3.12.0", but I think the latest GPU aware python version available for Jetson is 3.10...
I was able to just use pyenv to install python 3.12 (and make the virtualenv to add torch deps) on jetson nano 4gb and a nano 2gb and was able to build and run exo, but noticed I was missing libs.
I used jetson-container, but the nano's can only have up to L2T 32 (I tried to have jetson-container build using v36, but it just disregarded that and used v32 on ubuntu 22.04) and I think libnvidia-ml was only included in L2T v36 or something, which is where I last left off when trying.
Any update on this, looking to do the same, use multiple Jetson devices for cluster interference.
That would be cool, i have a couple of nanos and an NX i could dust off.
I have tried this today and I get the same error. It seems that nvml isn't supported on the Jetson devices. I found the following test script online:
import pynvml
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
pynvml.nvmlDeviceGetTotalEnergyConsumption(handle)
It doesn't work and errors with pynvml.NVMLError_Uninitialized: Uninitialized
I think the main issue is that exo needs python 3.12, which in turn needs newer libraries... Maybe a better approach would be to get someone from Jetson team to build and release pytorch for python 3.12 ... ๐
I think the main issue is that exo needs python 3.12, which in turn needs newer libraries... Maybe a better approach would be to get someone from Jetson team to build and release pytorch for python 3.12 ... ๐
Lagging ( or early EOL ) support from NVIDIA on those devices is why mine are collecting dust, and got off that treadmill. Looking forward to future NPU's instead.
I was able to install Python3.12 and set up a virtual environment. I got exo installed, but it won't run because of pynvml. python3-pynvml isn't available in the standard repos even for older versions of Python 3. Reading around online the NVIDIA Management Library (NVML) is a PC only thing, and isn't supported on Jetson.
My guess is that when EXO sees Linux and an NVIDIA GPU, it expects a PC, not a Jetson. It then uses NVML to do things, but as I said, that isn't available on the Jetson platform.
@garyexplains, i think so too that they considered PC machines and Mac devices, but this project is very suitable for Jetson devices.
Especially the new one Orin Nano Super. It's on backorder now, but is prime for some exo action at 8gb for only $249.
lol. just as that was posted i got an email from NVIDIA trying to sell me one.. "The Most Affordable Generative AI Supercomputer". Perhaps they will support more mainstream toolkits now?
Getting "pynvml.NVMLError_NotSupported" error with NVIDIA Jetson Orin NX:
exo None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. Selected inference engine: None
/ _ \ / / _ \ | /> < (_) | _/_/____/
Detected system: Linux Inference engine name after selection: tinygrad Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader [53754] Chat interface started:
- http://172.20.0.1:52415
- http://172.19.0.1:52415
- http://172.17.0.1:52415
- http://192.168.1.204:52415
- http://100.104.92.40:52415
- http://127.0.0.1:52415 ChatGPT API endpoint served at:
- http://172.20.0.1:52415/v1/chat/completions
- http://172.19.0.1:52415/v1/chat/completions
- http://172.17.0.1:52415/v1/chat/completions
- http://192.168.1.204:52415/v1/chat/completions
- http://100.104.92.40:52415/v1/chat/completions
- http://127.0.0.1:52415/v1/chat/completions
Traceback (most recent call last):
File "/home/nvidia/.local/bin/exo", line 33, in
sys.exit(load_entry_point('exo', 'console_scripts', 'exo')()) File "/home/nvidia/.local/bin/exo", line 25, in importlib_load_entry_point return next(matches).load() File "/usr/lib/python3.10/importlib/metadata/init.py", line 171, in load module = import_module(match.group('module')) File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File " ", line 1050, in _gcd_import File " ", line 1027, in _find_and_load File " ", line 1006, in _find_and_load_unlocked File " ", line 688, in _load_unlocked File " ", line 883, in exec_module File " ", line 241, in _call_with_frames_removed File "/home/nvidia/Projects/exo/exo/main.py", line 131, in node = Node( File "/home/nvidia/Projects/exo/exo/orchestration/node.py", line 40, in init self.device_capabilities = device_capabilities() File "/home/nvidia/Projects/exo/exo/topology/device_capabilities.py", line 151, in device_capabilities return linux_device_capabilities() File "/home/nvidia/Projects/exo/exo/topology/device_capabilities.py", line 193, in linux_device_capabilities gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle) File "/home/nvidia/.local/lib/python3.10/site-packages/pynvml.py", line 2934, in nvmlDeviceGetMemoryInfo _nvmlCheckReturn(ret) File "/home/nvidia/.local/lib/python3.10/site-packages/pynvml.py", line 979, in _nvmlCheckReturn raise NVMLError(ret) pynvml.NVMLError_NotSupported: Not Supported
My environment is as follows:
Here is a fix, tested on Jetson Orin AGX: https://github.com/D-G-Dimitrov/exo/commit/edf3bd54e7988ada0148ca429812c7f8ff76fe1f
...
gpu_name = gpu_raw_name.rsplit(" ", 1)[0] if gpu_raw_name.endswith("GB") else gpu_raw_name
if gpu_raw_name == 'ORIN (NVGPU)': # In case of a Jetson device
gpu_memory_info = get_jetson_device_meminfo()
else:
gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
if DEBUG >= 2: print(f"NVIDIA device {gpu_name=} {gpu_memory_info=}")
...
def get_jetson_device_meminfo():
from re import search
from pynvml import c_nvmlMemory_t
def extract_numeric_value(text):
"""Extract the first numeric value from a string."""
match = search(r'\d+', text)
return int(match.group()) if match else 0
# Read total and free memory from /proc/meminfo
with open("/proc/meminfo") as fp:
total_memory = extract_numeric_value(fp.readline())
free_memory = extract_numeric_value(fp.readline())
# Calculate used memory
used_memory = total_memory - free_memory
# Return memory info object
return c_nvmlMemory_t(
total=total_memory * 1000,
free=free_memory * 1000,
used=used_memory * 1000
)
@D-G-Dimitrov I confirm that this works! Thank you so much.
I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues.
I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues.
Same for my Orin Nano. Did you solve this issue?
I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues.
Same for my Orin Nano. Did you solve this issue?
I did not.
I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues.
Same for my Orin Nano. Did you solve this issue?
I did not.
Looks like there is no "easy" way to calculate the TFLOPS rating of a device, and thus it was just "hardcoded" like this: https://github.com/exo-explore/exo/blob/aa1ce21f820d8412dfac95957450c647aa7373d5/exo/topology/device_capabilities.py#L58 I think that the TFLOPS rating is just a UI "sugar", it should not prevent the app from running.
Just check to see if the GPU is really being used, then who cares about the 'indicator'
To me, Orin is not utilising the GPU, slow even running LLAMA 1B.
Yeah, I noticed that as well.
I tried the same model via exo (just single node, testing it solo) and ollama. Monitoring both exo and ollama with jtop, I could see both loaded the model with similar size (not at the same time, I did one, restarted did the other to make sure everything was unloaded).
I was getting 3-4~ tokens per second via exo, and 35-40 on ollama.
Is it because I did not install Pytorch nor Tensorflow, I was able to install Flax only.
Yeah, that makes sense if you were using flax. When I was testing, I already had pytorch installed (built pytorch on device with gpu enabled, using script provided by nvidia I found on their forum), but mine was/is still slow.
I am having trouble trying to do additional testing, though. After having moved the models directory over to ssd and setting the exo_home var, I went and redownloaded the llama 3.2 1b model and I went to test, now it keeps trying to load the model multiple times and fills up the devices memory and locks up every time.
I wish there was a way to make exo utilize ollama, as it works perfectly on all devices I own (nvidia gpu via in a container, an amd video card on my main pc, and multiple different Jetson devices) without requiring any changes.
Iโve had a Jetson cluster for some time with a Jetson mate board. NVIDIAโs only gpu enabled distributed workloads guides was via Slurm or Gpu enabled docker containers as a Kubernetes cluster. The above arenโt suited for running LLM but rather Ai workloads orchestration.
But still no mature solution to take advantage of Jetsonโs as a cluster.
What have you guys tried? Exo would have been perfectโฆ if it worked.