jetson-containers Unable to run Llamaspeak on Jetson Orin NX 16GB

Unable to run Llamaspeak on Jetson Orin NX 16GB

Open JQZhai opened this issue 8 months ago • 2 comments

Hi, I'm trying to run Llamaspeak following the Instructions on https://www.jetson-ai-lab.com/tutorial_llamaspeak.html

Specs: Jetson Orin NX(16GB) Developer Kit Jetpack 6.0 [L4T 36.3.0]

The RIVA server is up and running the ASR and TTS examples works just fine.

When i run the following code in /path/to/jetson-containers: root@ubuntu:/# python3 -m nano_llm.agents.web_chat --api=mlc --model /data/models/Meta-Llama-3-8B-Instruct/ --asr=riva --tts=piper

The Response is: /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead. warnings.warn( 06:43:49 | INFO | loading /data/models/Meta-Llama-3-8B-Instruct/ with MLC Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06:43:51 | INFO | running MLC quantization:

python3 -m mlc_llm.build --model /data/models/mlc/dist/models/ --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 8192 --artifact-path /data/models/mlc/dist/-ctx8192

Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/mlc_llm/build.py", line 47, in main() File "/usr/local/lib/python3.10/dist-packages/mlc_llm/build.py", line 41, in main parsed_args = core._parse_args(parsed_args) # pylint: disable=protected-access File "/usr/local/lib/python3.10/dist-packages/mlc_llm/core.py", line 444, in _parse_args parsed = _setup_model_path(parsed) File "/usr/local/lib/python3.10/dist-packages/mlc_llm/core.py", line 494, in _setup_model_path validate_config(args.model_path) File "/usr/local/lib/python3.10/dist-packages/mlc_llm/core.py", line 529, in validate_config assert os.path.exists( AssertionError: Expecting HuggingFace config, but file not found: /data/models/mlc/dist/models/config.json. Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/NanoLLM/nano_llm/agents/web_chat.py", line 327, in agent = WebChat(**vars(args)) File "/opt/NanoLLM/nano_llm/agents/web_chat.py", line 32, in init super().init(**kwargs) File "/opt/NanoLLM/nano_llm/agents/voice_chat.py", line 30, in init self.llm = ChatQuery(**kwargs) #ProcessProxy('ChatQuery', **kwargs)
File "/opt/NanoLLM/nano_llm/plugins/chat_query.py", line 78, in init self.model = NanoLLM.from_pretrained(model, **kwargs) File "/opt/NanoLLM/nano_llm/nano_llm.py", line 73, in from_pretrained model = MLCModel(model_path, **kwargs) File "/opt/NanoLLM/nano_llm/models/mlc.py", line 60, in init quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs) File "/opt/NanoLLM/nano_llm/models/mlc.py", line 271, in quantize subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/ --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 8192 --artifact-path /data/models/mlc/dist/-ctx8192 ' returned non-zero exit status 1.

Any help or recommendations are appreciated.

Jun 25 '24 06:06 JQZhai

jetson-containers jetson-containers copied to clipboard

Unable to run Llamaspeak on Jetson Orin NX 16GB

jetson-containers
jetson-containers copied to clipboard