jetson-containers icon indicating copy to clipboard operation
jetson-containers copied to clipboard

what's --chat-template ? and how to config it?

Open UserName-wang opened this issue 4 months ago • 7 comments

Hi, I tried to use Llama2-Chinese-13b-Chat, but got this error:

File "/opt/local_llm/local_llm/history.py", line 89, in init raise RuntimeError(f"Couldn't automatically determine model type from {model.config.name}, please set the --chat-template argument") RuntimeError: Couldn't automatically determine model type from Llama2-Chinese-13b-Chat, please set the --chat-template argument.

and I tried this but no success: python3 -m local_llm.agents.web_chat
--model /data/models/mlc/dist/models/Llama2-Chinese-13b-Chat
--api=mlc --verbose --chat-template = "User: {user_input}\nAI: {model_response}"

what's --chat-template ? where and how to set it? Thank you!

UserName-wang avatar Feb 19 '24 01:02 UserName-wang

Hi @UserName-wang, --chat-template should be one of the ChatTemplate dict keys from https://github.com/dusty-nv/jetson-containers/blob/master/packages/llm/local_llm/templates.py (or a new dict defining custom template)

Presuming this model follows the same chat template as the original llama-2-chat models, can you try running it with --chat-template=llama-2 ?

dusty-nv avatar Feb 19 '24 06:02 dusty-nv

Hi @dusty-nv , I tried your suggestion, got error message: RuntimeError: Couldn't automatically determine model type from Llama2-Chinese-13b-Chat, please set the --chat-template argument bash: --chat-template: command not found I tried: --chat-template=llama-2, --chat-template='llama-2', --chat-template llama-2 I modified templates.py tried to force it to use llama-2: else: # return None chat_template = 'llama-2' the webui started successfully.

UserName-wang avatar Feb 19 '24 10:02 UserName-wang

Hi @UserName-wang, glad to hear it started, do the chinese characters display appropriately?

Try running the command including --chat-template like this:

python3 -m local_llm.agents.web_chat \
--model /data/models/mlc/dist/models/Llama2-Chinese-13b-Chat \
--api=mlc --verbose \
--chat-template llama-2

dusty-nv avatar Feb 20 '24 02:02 dusty-nv

Dear @dusty-nv , thank you for your continuously support!

I tried yesterday some Chinese characters cannot display appropriately at that moment. And I only can prompt in terminal before. but by using your new suggestion I can use web ui for chinese conversation. and Chinese characters can be display appropriately.

But the web ui cannot support Chinese speaker. I already updated riva’s TTS ASR models to support Chinese conversation. I’m reading riva’s manual in order to figure out how to add Chinese speaker. Can you give me some hints to add Chinese speaker into the list on webui?

在 2024年2月20日,10:57,Dustin Franklin @.***> 写道:

chinese characters display appropriately

UserName-wang avatar Feb 20 '24 03:02 UserName-wang

Hi @dusty-nv , I already found the different voice for different language model: https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-overview.html?highlight=english%20us%20female and need to add the voice in index.html, for example:

But the agx xavier automatically shutdown which riva server running on when I tried to talk. I tried twice same issue happen. Maybe it's because of cuda driver? no issue with English voice chat.

UserName-wang avatar Feb 20 '24 13:02 UserName-wang

Hi @UserName-wang are you able to test RIVA independently first with your desired voice, like shown here?

  • https://github.com/dusty-nv/jetson-containers/tree/master/packages/audio/riva-client
  • https://github.com/nvidia-riva/python-clients#asr

For the applications in local_llm container, you should be able to set the TTS voice string when starting the python program with the --voice command line option (and for ASR, the --language-code). For the web UI, you would need to add your string to index.html here:

https://github.com/dusty-nv/jetson-containers/blob/aba771949940ba3e7f1deacf8b976350519ecb01/packages/llm/local_llm/web/templates/index.html#L217

To make it easier to edit the local_llm source files locally and have them reflected in the running container (without having to always rebuild the container), you can just mount your local source files into the container when starting it:

./run.sh \
  -v /path/to/your/jetson-containers/packages/llm/local_llm:/opt/local_llm/local_llm \
  $(./autotag local_llm)

Then you can interactively edit the source files and index.html in your local git tree, and the changes will show up inside the container.

dusty-nv avatar Feb 20 '24 18:02 dusty-nv

Dear @dusty-nv , I performed the test for RIVA independently and successfully . and modified Index.html. I can chat in Chinese, for some rounds at beginning, ASR and TTS works. but seems it's unstable I got error message after several rounds : 02:57:50 | DEBUG | sending ASR keep-alive silence (idle for 99 seconds) 02:58:11 | DEBUG | processing chat entry 8 role='bot' template=' ${MESSAGE}' open_user_prompt=False cached=false text=' "嗯,你好,我可以帮助你完成任务,例如写作业、编辑文章、提供信息和资讯等。"\n\n"嗯,你好,我可以帮助你完成任务,例如写作业、编辑文章、提供信息和资讯等。"\n\n"嗯,你好,我可以帮助你完成任务,例如' 02:58:11 | DEBUG | embedding text (1, 129, 5120) float16 -> "嗯,你好,我可以帮助你完成任务,例如写作业、编辑文章、提供信息和资讯等。"\n\n"嗯,你好,我可以帮助你完成任务,例如写作业、编辑文章、提供信息和资讯等。"\n\n"嗯,你好,我可以帮助你完成任务,例如</s> 02:58:11 | DEBUG | processing chat entry 9 role='user' template='[INST] ${MESSAGE} [/INST]' open_user_prompt=False cached=false text='如何改善睡眠质量' 02:58:11 | DEBUG | embedding text (1, 25, 5120) float16 -> <s>[INST] 如何改善睡眠质量 [/INST]

  1. 02:58:12 | DEBUG | generating TTS for ' 1.' Exception in thread Thread-10: Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/orin/study/llm/web_ui/local_llm/plugin.py", line 176, in run self.dispatch(self.input_queue.get(block=False)) File "/home/orin/study/llm/web_ui/local_llm/plugin.py", line 189, in dispatch outputs = self.process(input) File "/home/orin/study/llm/web_ui/local_llm/plugins/tts.py", line 83, in process for response in responses: File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 540, in next return self._next() File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 966, in _next raise self grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.UNKNOWN details = "Error: Triton model failed during inference. Error message: in ensemble 'fastpitch_hifigan_ensemble-Mandarin-CN', Preprocessor failed to transform input string " debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Error: Triton model failed during inference. Error message: in ensemble 'fastpitch_hifigan_ensemble-Mandarin-CN', Preprocessor failed to transform input string ", grpc_status:2, created_time:"2024-02-21T02:58:12.692433885+00:00"}"

Seems Riva server failed to transform input string.

UserName-wang avatar Feb 21 '24 03:02 UserName-wang