jetson-containers
jetson-containers copied to clipboard
Unable to run Llamaspeak on Jetson Orin NX 16GB
Hi, I'm trying to run Llamaspeak following the Instructions on https://www.jetson-ai-lab.com/tutorial_llamaspeak.html
Specs: Jetson Orin NX(16GB) Developer Kit Jetpack 6.0 [L4T 36.3.0]
The RIVA server is up and running the ASR and TTS examples works just fine.
When i run the following code in /path/to/jetson-containers:
root@ubuntu:/# python3 -m nano_llm.agents.web_chat --api=mlc --model /data/models/Meta-Llama-3-8B-Instruct/ --asr=riva --tts=piper
The Response is:
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. Use HF_HOME
instead.
warnings.warn(
06:43:49 | INFO | loading /data/models/Meta-Llama-3-8B-Instruct/ with MLC
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06:43:51 | INFO | running MLC quantization:
python3 -m mlc_llm.build --model /data/models/mlc/dist/models/ --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 8192 --artifact-path /data/models/mlc/dist/-ctx8192
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/build.py", line 47, in
File "/opt/NanoLLM/nano_llm/plugins/chat_query.py", line 78, in init
self.model = NanoLLM.from_pretrained(model, **kwargs)
File "/opt/NanoLLM/nano_llm/nano_llm.py", line 73, in from_pretrained
model = MLCModel(model_path, **kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 60, in init
quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 271, in quantize
subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/ --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 8192 --artifact-path /data/models/mlc/dist/-ctx8192 ' returned non-zero exit status 1.
Any help or recommendations are appreciated.