exo
exo copied to clipboard
running on a Jetson Orin NX
Good morning,
I have been trying to make the exo project work on my Orin NX without success, here is the error I am getting when running exo:
(exo) sgoudelis@jetson:~/projects/exo$ exo
Selected inference engine: None
_____ _____
/ _ \ \/ / _ \
| __/> < (_) |
\___/_/\_\___/
Detected system: Linux
Inference engine name after selection: tinygrad
Traceback (most recent call last):
File "/home/sgoudelis/miniconda3/envs/exo/bin/exo", line 33, in <module>
sys.exit(load_entry_point('exo', 'console_scripts', 'exo')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/bin/exo", line 25, in importlib_load_entry_point
return next(matches).load()
^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/importlib/metadata/__init__.py", line 205, in load
module = import_module(match.group('module'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/importlib/__init__.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 999, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "/home/sgoudelis/projects/exo/exo/main.py", line 106, in <module>
inference_engine = get_inference_engine(inference_engine_name, shard_downloader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/projects/exo/exo/inference/inference_engine.py", line 69, in get_inference_engine
from exo.inference.tinygrad.inference import TinygradDynamicShardInferenceEngine
File "/home/sgoudelis/projects/exo/exo/inference/tinygrad/inference.py", line 4, in <module>
from exo.inference.tinygrad.models.llama import Transformer, TransformerShard, convert_from_huggingface, fix_bf16, sample_logits
File "/home/sgoudelis/projects/exo/exo/inference/tinygrad/models/llama.py", line 2, in <module>
from tinygrad import Tensor, Variable, TinyJit, dtypes, nn, Device
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/__init__.py", line 5, in <module>
from tinygrad.tensor import Tensor # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/tensor.py", line 12, in <module>
from tinygrad.device import Device, BufferSpec
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/device.py", line 226, in <module>
class CPUProgram:
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/device.py", line 227, in CPUProgram
helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'kernel32' if sys.platform == "win32" else 'gcc_s'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/ctypes/__init__.py", line 379, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so: invalid ELF header
looking into the so file I get this:
(exo) sgoudelis@jetson:~/projects/exo$ file /home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so
/home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so: ASCII text
(exo) sgoudelis@jetson:~/projects/exo$ more /home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so
/* GNU ld script
Use the shared library, but some functions are only in
the static library. */
GROUP ( libgcc_s.so.1 -lgcc )
Anyone had any idea how to make exo work on the Orin Jetson ?
UPDATE:
Moving the mentioned static object file out of the way actually makes exo go further. It does fail in another way:
Traceback (most recent call last):
File "/home/sgoudelis/miniconda3/envs/exo/bin/exo", line 33, in <module>
sys.exit(load_entry_point('exo', 'console_scripts', 'exo')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/projects/exo/exo/main.py", line 385, in run
loop.run_until_complete(main())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/home/sgoudelis/projects/exo/exo/main.py", line 349, in main
await node.start(wait_for_peers=args.wait_for_peers)
File "/home/sgoudelis/projects/exo/exo/orchestration/node.py", line 59, in start
self.device_capabilities = await device_capabilities()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/projects/exo/exo/topology/device_capabilities.py", line 153, in device_capabilities
return await linux_device_capabilities()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/projects/exo/exo/topology/device_capabilities.py", line 188, in linux_device_capabilities
gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/pynvml.py", line 2934, in nvmlDeviceGetMemoryInfo
_nvmlCheckReturn(ret)
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/pynvml.py", line 979, in _nvmlCheckReturn
raise NVMLError(ret)
pynvml.NVMLError_NotSupported: Not Supported
I am a complete noob when it comes to NVIDIA CUDA stuff btw. I am guessing this happens because the Orin has shared memory.
ANOTHER UPDATE:
Exo does work with the Orin NX 16GB, by bypassing the part of the code is querying the VRAM amount and giving it a bogus number does make exo boot up just fine and also have GPU accelerated inference.
I would love for some feedback from one of the developers of the Exo project about this. Please feel free to comment.