exo icon indicating copy to clipboard operation
exo copied to clipboard

Support for Jetson?

Open gyillikci opened this issue 1 year ago โ€ข 34 comments

Hello,

I tried to launch exo on different versions of Jetpack but couldnt get a working exo on Jetson Orin due to the device_capabilities cant be discovered. If there is a way to utilize Jetson with Exo, any guide would be very helpful.

Thanks,

Giray

gyillikci avatar Oct 04 '24 08:10 gyillikci

I've never tried this but I don't see why it wouldn't work. Can you send the logs with DEBUG=6? What do you mean by "device_capabilities can't be discovered"?

AlexCheema avatar Oct 04 '24 12:10 AlexCheema

Hi Alex, This error message is from Jetson Orin 16gb freshly installed Jetpack 5.1.4 with all components. ` ~/exo$ DEBUG=6 exo None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


/ _ \ / / _ \ | /> < (_) | _/_/____/

Detected system: Linux Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader Trying to find available port port=62480 [] Using available port: 62480 Generated and stored new node ID: 7ce7ce87-6cbd-4009-9c84-2a13ca626afc Chat interface started:

  • http://127.0.0.1:8000
  • http://192.168.1.103:8000
  • http://172.17.0.1:8000 ChatGPT API endpoint served at:
  • http://127.0.0.1:8000/v1/chat/completions
  • http://192.168.1.103:8000/v1/chat/completions
  • http://172.17.0.1:8000/v1/chat/completions tinygrad Device.DEFAULT='CUDA' Traceback (most recent call last): File "/home/orin/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 1798, in _LoadNvmlLibrary nvmlLib = CDLL("libnvidia-ml.so.1") ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/ctypes/init.py", line 379, in init self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/orin/.venv/bin/exo", line 5, in from exo.main import run File "/home/orin/exo/exo/main.py", line 82, in node = StandardNode( ^^^^^^^^^^^^^ File "/home/orin/exo/exo/orchestration/standard_node.py", line 38, in init self.device_capabilities = device_capabilities() ^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/exo/topology/device_capabilities.py", line 139, in device_capabilities return linux_device_capabilities() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/exo/topology/device_capabilities.py", line 177, in linux_device_capabilities pynvml.nvmlInit() File "/home/orin/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 1770, in nvmlInit nvmlInitWithFlags(0) File "/home/orin/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 1753, in nvmlInitWithFlags _LoadNvmlLibrary() File "/home/orin/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 1800, in _LoadNvmlLibrary _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND) File "/home/orin/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn raise NVMLError(ret) pynvml.nvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Exo Cluster (0 nodes) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Download Progress โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ โ”‚

` The following error is from Jetpack 6.1:

(.venv) orin@orin-desktop:~/exo$ exo None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


/ _ \ / / _ \ | /> < (_) | _/_/____/

Detected system: Linux Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader [] Chat interface started:

  • http://172.17.0.1:8000
  • http://127.0.0.1:8000
  • http://192.168.1.103:8000 ChatGPT API endpoint served at:
  • http://172.17.0.1:8000/v1/chat/completions
  • http://127.0.0.1:8000/v1/chat/completions
  • http://192.168.1.103:8000/v1/chat/completions Traceback (most recent call last): File "/home/orin/exo/.venv/bin/exo", line 5, in from exo.main import run File "/home/orin/exo/exo/main.py", line 82, in node = StandardNode( ^^^^^^^^^^^^^ File "/home/orin/exo/exo/orchestration/standard_node.py", line 38, in init self.device_capabilities = device_capabilities() ^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/exo/topology/device_capabilities.py", line 139, in device_capabilities return linux_device_capabilities() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/exo/topology/device_capabilities.py", line 180, in linux_device_capabilities gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/orin/exo/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 2440, in nvmlDeviceGetMemoryInfo _nvmlCheckReturn(ret) File "/home/orin/exo/.venv/lib/python3.12/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn raise NVMLError(ret) pynvml.nvml.NVMLError_NotSupported: Not Supported โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Exo Cluster (0 nodes) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Download Progress โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ โ”‚

Here is the nvidia-smi output for Jetpack 6.1:

image

gyillikci avatar Oct 04 '24 16:10 gyillikci

@gyillikci Did you make any progress on the Jetson? I'm looking forward trying it on a bunch of Jetson Nano boards. Kindly let me know if you have made any progress. Thanks.

gigwegbe avatar Nov 18 '24 03:11 gigwegbe

Hi,

unfortunately, I haven't tried it after the post. but it seems the repo is very active.

gyillikci avatar Nov 18 '24 09:11 gyillikci

Thanks for the feedback.

gigwegbe avatar Nov 18 '24 09:11 gigwegbe

Man I'm also trying to run it on a Jestson AGX Orin, looks like the best option would be to setup a "jetson-container", but no luck... I think the main issue is exo's requirement "Python>=3.12.0", but I think the latest GPU aware python version available for Jetson is 3.10...

ihubanov avatar Nov 22 '24 20:11 ihubanov

I was able to just use pyenv to install python 3.12 (and make the virtualenv to add torch deps) on jetson nano 4gb and a nano 2gb and was able to build and run exo, but noticed I was missing libs.

I used jetson-container, but the nano's can only have up to L2T 32 (I tried to have jetson-container build using v36, but it just disregarded that and used v32 on ubuntu 22.04) and I think libnvidia-ml was only included in L2T v36 or something, which is where I last left off when trying.

MostHated avatar Dec 01 '24 01:12 MostHated

Any update on this, looking to do the same, use multiple Jetson devices for cluster interference.

JC1738 avatar Dec 18 '24 18:12 JC1738

That would be cool, i have a couple of nanos and an NX i could dust off.

Nurb4000 avatar Dec 18 '24 18:12 Nurb4000

I have tried this today and I get the same error. It seems that nvml isn't supported on the Jetson devices. I found the following test script online:

import pynvml
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
pynvml.nvmlDeviceGetTotalEnergyConsumption(handle)

It doesn't work and errors with pynvml.NVMLError_Uninitialized: Uninitialized

garyexplains avatar Dec 19 '24 15:12 garyexplains

I think the main issue is that exo needs python 3.12, which in turn needs newer libraries... Maybe a better approach would be to get someone from Jetson team to build and release pytorch for python 3.12 ... ๐Ÿ˜”

ihubanov avatar Dec 19 '24 15:12 ihubanov

I think the main issue is that exo needs python 3.12, which in turn needs newer libraries... Maybe a better approach would be to get someone from Jetson team to build and release pytorch for python 3.12 ... ๐Ÿ˜”

Lagging ( or early EOL ) support from NVIDIA on those devices is why mine are collecting dust, and got off that treadmill. Looking forward to future NPU's instead.

Nurb4000 avatar Dec 19 '24 16:12 Nurb4000

I was able to install Python3.12 and set up a virtual environment. I got exo installed, but it won't run because of pynvml. python3-pynvml isn't available in the standard repos even for older versions of Python 3. Reading around online the NVIDIA Management Library (NVML) is a PC only thing, and isn't supported on Jetson.

garyexplains avatar Dec 19 '24 16:12 garyexplains

My guess is that when EXO sees Linux and an NVIDIA GPU, it expects a PC, not a Jetson. It then uses NVML to do things, but as I said, that isn't available on the Jetson platform.

garyexplains avatar Dec 19 '24 16:12 garyexplains

@garyexplains, i think so too that they considered PC machines and Mac devices, but this project is very suitable for Jetson devices.

gyillikci avatar Dec 19 '24 18:12 gyillikci

Especially the new one Orin Nano Super. It's on backorder now, but is prime for some exo action at 8gb for only $249.

MostHated avatar Dec 19 '24 20:12 MostHated

lol. just as that was posted i got an email from NVIDIA trying to sell me one.. "The Most Affordable Generative AI Supercomputer". Perhaps they will support more mainstream toolkits now?

Nurb4000 avatar Dec 19 '24 20:12 Nurb4000

Getting "pynvml.NVMLError_NotSupported" error with NVIDIA Jetson Orin NX:

exo None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. Selected inference engine: None


/ _ \ / / _ \ | /> < (_) | _/_/____/

Detected system: Linux Inference engine name after selection: tinygrad Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader [53754] Chat interface started:

  • http://172.20.0.1:52415
  • http://172.19.0.1:52415
  • http://172.17.0.1:52415
  • http://192.168.1.204:52415
  • http://100.104.92.40:52415
  • http://127.0.0.1:52415 ChatGPT API endpoint served at:
  • http://172.20.0.1:52415/v1/chat/completions
  • http://172.19.0.1:52415/v1/chat/completions
  • http://172.17.0.1:52415/v1/chat/completions
  • http://192.168.1.204:52415/v1/chat/completions
  • http://100.104.92.40:52415/v1/chat/completions
  • http://127.0.0.1:52415/v1/chat/completions Traceback (most recent call last): File "/home/nvidia/.local/bin/exo", line 33, in sys.exit(load_entry_point('exo', 'console_scripts', 'exo')()) File "/home/nvidia/.local/bin/exo", line 25, in importlib_load_entry_point return next(matches).load() File "/usr/lib/python3.10/importlib/metadata/init.py", line 171, in load module = import_module(match.group('module')) File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/home/nvidia/Projects/exo/exo/main.py", line 131, in node = Node( File "/home/nvidia/Projects/exo/exo/orchestration/node.py", line 40, in init self.device_capabilities = device_capabilities() File "/home/nvidia/Projects/exo/exo/topology/device_capabilities.py", line 151, in device_capabilities return linux_device_capabilities() File "/home/nvidia/Projects/exo/exo/topology/device_capabilities.py", line 193, in linux_device_capabilities gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle) File "/home/nvidia/.local/lib/python3.10/site-packages/pynvml.py", line 2934, in nvmlDeviceGetMemoryInfo _nvmlCheckReturn(ret) File "/home/nvidia/.local/lib/python3.10/site-packages/pynvml.py", line 979, in _nvmlCheckReturn raise NVMLError(ret) pynvml.NVMLError_NotSupported: Not Supported

My environment is as follows: image

c9482 avatar Jan 01 '25 00:01 c9482

Here is a fix, tested on Jetson Orin AGX: https://github.com/D-G-Dimitrov/exo/commit/edf3bd54e7988ada0148ca429812c7f8ff76fe1f

...
    gpu_name = gpu_raw_name.rsplit(" ", 1)[0] if gpu_raw_name.endswith("GB") else gpu_raw_name
    if gpu_raw_name == 'ORIN (NVGPU)': # In case of a Jetson device
      gpu_memory_info = get_jetson_device_meminfo()
    else:
      gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)

    if DEBUG >= 2: print(f"NVIDIA device {gpu_name=} {gpu_memory_info=}")
...

def get_jetson_device_meminfo():
  from re import search
  from pynvml import c_nvmlMemory_t

  def extract_numeric_value(text):
    """Extract the first numeric value from a string."""
    match = search(r'\d+', text)
    return int(match.group()) if match else 0

  # Read total and free memory from /proc/meminfo
  with open("/proc/meminfo") as fp:
    total_memory = extract_numeric_value(fp.readline())
    free_memory = extract_numeric_value(fp.readline())

  # Calculate used memory
  used_memory = total_memory - free_memory

  # Return memory info object
  return c_nvmlMemory_t(
    total=total_memory * 1000,
    free=free_memory * 1000,
    used=used_memory * 1000
  )

D-G-Dimitrov avatar Jan 03 '25 16:01 D-G-Dimitrov

@D-G-Dimitrov I confirm that this works! Thank you so much.

c9482 avatar Jan 03 '25 23:01 c9482

I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues.

image

c9482 avatar Jan 04 '25 20:01 c9482

I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues.

image

Same for my Orin Nano. Did you solve this issue?

kiangyw avatar Jan 24 '25 09:01 kiangyw

I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues. image

Same for my Orin Nano. Did you solve this issue?

I did not.

c9482 avatar Jan 24 '25 16:01 c9482

I am not seeing TFLOPS rating for the NVIDIA ORIN box. Is this an indication that some component is missing? I have Ollama with GPU support installed and running on the same box with no issues. image

Same for my Orin Nano. Did you solve this issue?

I did not.

Looks like there is no "easy" way to calculate the TFLOPS rating of a device, and thus it was just "hardcoded" like this: https://github.com/exo-explore/exo/blob/aa1ce21f820d8412dfac95957450c647aa7373d5/exo/topology/device_capabilities.py#L58 I think that the TFLOPS rating is just a UI "sugar", it should not prevent the app from running.

D-G-Dimitrov avatar Jan 24 '25 16:01 D-G-Dimitrov

Just check to see if the GPU is really being used, then who cares about the 'indicator'

Nurb4000 avatar Jan 24 '25 16:01 Nurb4000

To me, Orin is not utilising the GPU, slow even running LLAMA 1B.

bingo619 avatar Feb 13 '25 09:02 bingo619

Yeah, I noticed that as well.

I tried the same model via exo (just single node, testing it solo) and ollama. Monitoring both exo and ollama with jtop, I could see both loaded the model with similar size (not at the same time, I did one, restarted did the other to make sure everything was unloaded).

I was getting 3-4~ tokens per second via exo, and 35-40 on ollama.

MostHated avatar Feb 15 '25 18:02 MostHated

Is it because I did not install Pytorch nor Tensorflow, I was able to install Flax only.

bingo619 avatar Feb 17 '25 06:02 bingo619

Yeah, that makes sense if you were using flax. When I was testing, I already had pytorch installed (built pytorch on device with gpu enabled, using script provided by nvidia I found on their forum), but mine was/is still slow.

I am having trouble trying to do additional testing, though. After having moved the models directory over to ssd and setting the exo_home var, I went and redownloaded the llama 3.2 1b model and I went to test, now it keeps trying to load the model multiple times and fills up the devices memory and locks up every time.

I wish there was a way to make exo utilize ollama, as it works perfectly on all devices I own (nvidia gpu via in a container, an amd video card on my main pc, and multiple different Jetson devices) without requiring any changes.

MostHated avatar Feb 17 '25 17:02 MostHated

Iโ€™ve had a Jetson cluster for some time with a Jetson mate board. NVIDIAโ€™s only gpu enabled distributed workloads guides was via Slurm or Gpu enabled docker containers as a Kubernetes cluster. The above arenโ€™t suited for running LLM but rather Ai workloads orchestration.

But still no mature solution to take advantage of Jetsonโ€™s as a cluster.

What have you guys tried? Exo would have been perfectโ€ฆ if it worked.

gavan-x avatar Mar 24 '25 00:03 gavan-x