exo Support for Jetson?

Hello,

I tried to launch exo on different versions of Jetpack but couldnt get a working exo on Jetson Orin due to the device_capabilities cant be discovered. If there is a way to utilize Jetson with Exo, any guide would be very helpful.

Thanks,

Giray

Oct 04 '24 08:10 gyillikci

I've never tried this but I don't see why it wouldn't work. Can you send the logs with DEBUG=6? What do you mean by "device_capabilities can't be discovered"?

Oct 04 '24 12:10 AlexCheema

Hi Alex, This error message is from Jetson Orin 16gb freshly installed Jetpack 5.1.4 with all components. ` ~/exo$ DEBUG=6 exo None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.

/ _ \ / / _ \ | /> < (_) | _/_/____/

Detected system: Linux Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader Trying to find available port port=62480 [] Using available port: 62480 Generated and stored new node ID: 7ce7ce87-6cbd-4009-9c84-2a13ca626afc Chat interface started:

http://127.0.0.1:8000
http://192.168.1.103:8000
http://172.17.0.1:8000 ChatGPT API endpoint served at:
http://127.0.0.1:8000/v1/chat/completions
http://192.168.1.103:8000/v1/chat/completions
http://172.17.0.1:8000/v1/chat/completions tinygrad Device.DEFAULT='CUDA' Traceback (most recent call last): File "/home/orin/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 1798, in _LoadNvmlLibrary nvmlLib = CDLL("libnvidia-ml.so.1") ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/ctypes/init.py", line 379, in init self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/orin/.venv/bin/exo", line 5, in from exo.main import run File "/home/orin/exo/exo/main.py", line 82, in node = StandardNode( ^^^^^^^^^^^^^ File "/home/orin/exo/exo/orchestration/standard_node.py", line 38, in init self.device_capabilities = device_capabilities() ^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/exo/topology/device_capabilities.py", line 139, in device_capabilities return linux_device_capabilities() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/exo/topology/device_capabilities.py", line 177, in linux_device_capabilities pynvml.nvmlInit() File "/home/orin/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 1770, in nvmlInit nvmlInitWithFlags(0) File "/home/orin/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 1753, in nvmlInitWithFlags _LoadNvmlLibrary() File "/home/orin/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 1800, in _LoadNvmlLibrary _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND) File "/home/orin/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn raise NVMLError(ret) pynvml.nvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found ╭─────────────────────────── Exo Cluster (0 nodes) ────────────────────────────╮ ╭───────────────────────────── Download Progress ──────────────────────────────╮ │ │

` The following error is from Jetpack 6.1:

(.venv) orin@orin-desktop:~/exo$ exo None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.

/ _ \ / / _ \ | /> < (_) | _/_/____/

Detected system: Linux Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader [] Chat interface started:

http://172.17.0.1:8000
http://127.0.0.1:8000
http://192.168.1.103:8000 ChatGPT API endpoint served at:
http://172.17.0.1:8000/v1/chat/completions
http://127.0.0.1:8000/v1/chat/completions
http://192.168.1.103:8000/v1/chat/completions Traceback (most recent call last): File "/home/orin/exo/.venv/bin/exo", line 5, in from exo.main import run File "/home/orin/exo/exo/main.py", line 82, in node = StandardNode( ^^^^^^^^^^^^^ File "/home/orin/exo/exo/orchestration/standard_node.py", line 38, in init self.device_capabilities = device_capabilities() ^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/exo/topology/device_capabilities.py", line 139, in device_capabilities return linux_device_capabilities() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/exo/topology/device_capabilities.py", line 180, in linux_device_capabilities gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 2440, in nvmlDeviceGetMemoryInfo _nvmlCheckReturn(ret) File "/home/orin/exo/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn raise NVMLError(ret) pynvml.nvml.NVMLError_NotSupported: Not Supported ╭───────────────────── Exo Cluster (0 nodes) ──────────────────────╮ ╭─────────────────────── Download Progress ────────────────────────╮ │ │

Here is the nvidia-smi output for Jetpack 6.1:

Oct 04 '24 16:10 gyillikci

@gyillikci Did you make any progress on the Jetson? I'm looking forward trying it on a bunch of Jetson Nano boards. Kindly let me know if you have made any progress. Thanks.

Nov 18 '24 03:11 gigwegbe

Hi,

unfortunately, I haven't tried it after the post. but it seems the repo is very active.

Nov 18 '24 09:11 gyillikci

Thanks for the feedback.

Nov 18 '24 09:11 gigwegbe

Man I'm also trying to run it on a Jestson AGX Orin, looks like the best option would be to setup a "jetson-container", but no luck... I think the main issue is exo's requirement "Python>=3.12.0", but I think the latest GPU aware python version available for Jetson is 3.10...

Nov 22 '24 20:11 ihubanov

I was able to just use pyenv to install python 3.12 (and make the virtualenv to add torch deps) on jetson nano 4gb and a nano 2gb and was able to build and run exo, but noticed I was missing libs.

I used jetson-container, but the nano's can only have up to L2T 32 (I tried to have jetson-container build using v36, but it just disregarded that and used v32 on ubuntu 22.04) and I think libnvidia-ml was only included in L2T v36 or something, which is where I last left off when trying.

Dec 01 '24 01:12 MostHated

Any update on this, looking to do the same, use multiple Jetson devices for cluster interference.

Dec 18 '24 18:12 JC1738

That would be cool, i have a couple of nanos and an NX i could dust off.

Dec 18 '24 18:12 Nurb4000

I have tried this today and I get the same error. It seems that nvml isn't supported on the Jetson devices. I found the following test script online:

import pynvml
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
pynvml.nvmlDeviceGetTotalEnergyConsumption(handle)

It doesn't work and errors with pynvml.NVMLError_Uninitialized: Uninitialized

Dec 19 '24 15:12 garyexplains

I think the main issue is that exo needs python 3.12, which in turn needs newer libraries... Maybe a better approach would be to get someone from Jetson team to build and release pytorch for python 3.12 ... 😔

Dec 19 '24 15:12 ihubanov

I think the main issue is that exo needs python 3.12, which in turn needs newer libraries... Maybe a better approach would be to get someone from Jetson team to build and release pytorch for python 3.12 ... 😔

Lagging ( or early EOL ) support from NVIDIA on those devices is why mine are collecting dust, and got off that treadmill. Looking forward to future NPU's instead.

Dec 19 '24 16:12 Nurb4000

I was able to install Python3.12 and set up a virtual environment. I got exo installed, but it won't run because of pynvml. python3-pynvml isn't available in the standard repos even for older versions of Python 3. Reading around online the NVIDIA Management Library (NVML) is a PC only thing, and isn't supported on Jetson.

Dec 19 '24 16:12 garyexplains

My guess is that when EXO sees Linux and an NVIDIA GPU, it expects a PC, not a Jetson. It then uses NVML to do things, but as I said, that isn't available on the Jetson platform.

Dec 19 '24 16:12 garyexplains

@garyexplains, i think so too that they considered PC machines and Mac devices, but this project is very suitable for Jetson devices.

Dec 19 '24 18:12 gyillikci

Especially the new one Orin Nano Super. It's on backorder now, but is prime for some exo action at 8gb for only $249.

Dec 19 '24 20:12 MostHated

lol. just as that was posted i got an email from NVIDIA trying to sell me one.. "The Most Affordable Generative AI Supercomputer". Perhaps they will support more mainstream toolkits now?

Dec 19 '24 20:12 Nurb4000

Getting "pynvml.NVMLError_NotSupported" error with NVIDIA Jetson Orin NX:

exo None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. Selected inference engine: None

/ _ \ / / _ \ | /> < (_) | _/_/____/

Detected system: Linux Inference engine name after selection: tinygrad Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader [53754] Chat interface started:

http://172.20.0.1:52415
http://172.19.0.1:52415
http://172.17.0.1:52415
http://192.168.1.204:52415
http://100.104.92.40:52415
http://127.0.0.1:52415 ChatGPT API endpoint served at:
http://172.20.0.1:52415/v1/chat/completions
http://172.19.0.1:52415/v1/chat/completions
http://172.17.0.1:52415/v1/chat/completions
http://192.168.1.204:52415/v1/chat/completions
http://100.104.92.40:52415/v1/chat/completions
http://127.0.0.1:52415/v1/chat/completions Traceback (most recent call last): File "/home/nvidia/.local/bin/exo", line 33, in sys.exit(load_entry_point('exo', 'console_scripts', 'exo')()) File "/home/nvidia/.local/bin/exo", line 25, in importlib_load_entry_point return next(matches).load() File "/usr/lib/python3.10/importlib/metadata/init.py", line 171, in load module = import_module(match.group('module')) File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/home/nvidia/Projects/exo/exo/main.py", line 131, in node = Node( File "/home/nvidia/Projects/exo/exo/orchestration/node.py", line 40, in init self.device_capabilities = device_capabilities() File "/home/nvidia/Projects/exo/exo/topology/device_capabilities.py", line 151, in device_capabilities return linux_device_capabilities() File "/home/nvidia/Projects/exo/exo/topology/device_capabilities.py", line 193, in linux_device_capabilities gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle) File "/home/nvidia/.local/lib/python3.10/site-packages/pynvml.py", line 2934, in nvmlDeviceGetMemoryInfo _nvmlCheckReturn(ret) File "/home/nvidia/.local/lib/python3.10/site-packages/pynvml.py", line 979, in _nvmlCheckReturn raise NVMLError(ret) pynvml.NVMLError_NotSupported: Not Supported

My environment is as follows:

Jan 01 '25 00:01 c9482

Here is a fix, tested on Jetson Orin AGX: https://github.com/D-G-Dimitrov/exo/commit/edf3bd54e7988ada0148ca429812c7f8ff76fe1f

...
    gpu_name = gpu_raw_name.rsplit(" ", 1)[0] if gpu_raw_name.endswith("GB") else gpu_raw_name
    if gpu_raw_name == 'ORIN (NVGPU)': # In case of a Jetson device
      gpu_memory_info = get_jetson_device_meminfo()
    else:
      gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)

    if DEBUG >= 2: print(f"NVIDIA device {gpu_name=} {gpu_memory_info=}")
...

def get_jetson_device_meminfo():
  from re import search
  from pynvml import c_nvmlMemory_t

  def extract_numeric_value(text):
    """Extract the first numeric value from a string."""
    match = search(r'\d+', text)
    return int(match.group()) if match else 0

  # Read total and free memory from /proc/meminfo
  with open("/proc/meminfo") as fp:
    total_memory = extract_numeric_value(fp.readline())
    free_memory = extract_numeric_value(fp.readline())

  # Calculate used memory
  used_memory = total_memory - free_memory

  # Return memory info object
  return c_nvmlMemory_t(
    total=total_memory * 1000,
    free=free_memory * 1000,
    used=used_memory * 1000
  )

Jan 03 '25 16:01 D-G-Dimitrov

@D-G-Dimitrov I confirm that this works! Thank you so much.

Jan 03 '25 23:01 c9482

I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues.

Jan 04 '25 20:01 c9482

I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues.

Same for my Orin Nano. Did you solve this issue?

Jan 24 '25 09:01 kiangyw

I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues.

Same for my Orin Nano. Did you solve this issue?

I did not.

Jan 24 '25 16:01 c9482

I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues.

Same for my Orin Nano. Did you solve this issue?

I did not.

Looks like there is no "easy" way to calculate the TFLOPS rating of a device, and thus it was just "hardcoded" like this: https://github.com/exo-explore/exo/blob/aa1ce21f820d8412dfac95957450c647aa7373d5/exo/topology/device_capabilities.py#L58 I think that the TFLOPS rating is just a UI "sugar", it should not prevent the app from running.

Jan 24 '25 16:01 D-G-Dimitrov

Just check to see if the GPU is really being used, then who cares about the 'indicator'

Jan 24 '25 16:01 Nurb4000

To me, Orin is not utilising the GPU, slow even running LLAMA 1B.

Feb 13 '25 09:02 bingo619

Yeah, I noticed that as well.

I tried the same model via exo (just single node, testing it solo) and ollama. Monitoring both exo and ollama with jtop, I could see both loaded the model with similar size (not at the same time, I did one, restarted did the other to make sure everything was unloaded).

I was getting 3-4~ tokens per second via exo, and 35-40 on ollama.

Feb 15 '25 18:02 MostHated

Is it because I did not install Pytorch nor Tensorflow, I was able to install Flax only.

Feb 17 '25 06:02 bingo619

Yeah, that makes sense if you were using flax. When I was testing, I already had pytorch installed (built pytorch on device with gpu enabled, using script provided by nvidia I found on their forum), but mine was/is still slow.

I am having trouble trying to do additional testing, though. After having moved the models directory over to ssd and setting the exo_home var, I went and redownloaded the llama 3.2 1b model and I went to test, now it keeps trying to load the model multiple times and fills up the devices memory and locks up every time.

I wish there was a way to make exo utilize ollama, as it works perfectly on all devices I own (nvidia gpu via in a container, an amd video card on my main pc, and multiple different Jetson devices) without requiring any changes.

Feb 17 '25 17:02 MostHated

I’ve had a Jetson cluster for some time with a Jetson mate board. NVIDIA’s only gpu enabled distributed workloads guides was via Slurm or Gpu enabled docker containers as a Kubernetes cluster. The above aren’t suited for running LLM but rather Ai workloads orchestration.

But still no mature solution to take advantage of Jetson’s as a cluster.

What have you guys tried? Exo would have been perfect… if it worked.

Mar 24 '25 00:03 gavan-x