NanoLLM icon indicating copy to clipboard operation
NanoLLM copied to clipboard

Meta-Llama-3-8B-Instruct always reply: You are a helpful and friendly AI assistant.<|eot_id|>

Open UserName-wang opened this issue 1 year ago • 6 comments
trafficstars

Dear @dusty-nv ,

I'm trying the example code on web page: Function Calling.

I tried both Llama-2-7b-chat-hf and Meta-Llama-3-8B-Instruct. Seems Llama-2-7b-chat-hf is more reliable than Meta-Llama-3-8B-Instruct. Here are the replies when I'm using Meta-Llama-3-8B-Instruct:

hello You are a helpful and friendly AI assistant.<|eot_id|>

what is time current time? You are a helpful and friendly AI assistant<|eot_id|>

what's the current date? You are a helpful and friendly AI assistant<|eot_id|>

But even Llama-2-7b-chat-hf cannot invoke the giving bot_function each time. when I'm using Llama-2-7b-chat-hf, I have to specify the function name in prompt to increase the possibility to invoke the bot_function.

And how to pass arguments to the bot_function in a better and reliable way? thank you!

UserName-wang avatar Jun 08 '24 01:06 UserName-wang

Hi @UserName-wang , I have found https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B model to be way better at function calling (it was finetuned for it), and showed in controlling HomeAssistant in the last research group meeting: https://www.jetson-ai-lab.com/research.html#past-meetings

I still need to update the documentation for this though, and well I also need to make more modifications to add bot functions at runtime, ect. These tuned models are able to pass the function arguments reliably.

dusty-nv avatar Jun 08 '24 02:06 dusty-nv

Hi! Thank you for your quick reply!Is there any model on NVIDIA NGC we can use via API Key?在 2024年6月8日,10:39,Dustin Franklin @.***> 写道: Hi @UserName-wang , I have found https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B model to be way better at function calling (it was finetuned for it), and showed in controlling HomeAssistant in the last research group meeting: https://www.jetson-ai-lab.com/research.html#past-meetings I still need to update the documentation for this though, and well I also need to make more modifications to add bot functions at runtime, ect. These tuned models are able to pass the function arguments reliably.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

UserName-wang avatar Jun 08 '24 03:06 UserName-wang

There is not yet, but I hope/think we see more of these function-tuned models coming out, let me know if you see or find anymore in the smaller range of LLMs!

dusty-nv avatar Jun 08 '24 03:06 dusty-nv

Hi!I find out that Langchain OLlamaFounctoins can call tools and pass arguments reliably. But the response speed is very low. The invoke command takes nearly 5 seconds!在 2024年6月8日,11:58,Dustin Franklin @.***> 写道: There is not yet, but I hope/think we see more of these function-tuned models coming out, let me know if you see or find anymore in the smaller range of LLMs!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

UserName-wang avatar Jun 08 '24 04:06 UserName-wang

@dusty-nv what settings did you use in the Llamaspeak function calling video? I'm having trouble achieving similar performance, using and modifying the example script. I tried the NousHermes model, being careful to use (and toy with) their function-calling system prompt and chat-ml-tools template. It was pretty good at creating bare function calls, but failed in conversation of any complexity (hallucinating functions, failing to call functions, etc). Also, functions are not executed. I see that functions are passed to model.generate(), but when I pull the thread through the code, I'm not sure I see where they are being executed. In mlc._generate() around line 460?

function_next = function(stream.text)

Is the execution up to me, or is it automated? Just asking before I start reinventing a little wheel.

bigrobinson avatar Aug 20 '24 18:08 bigrobinson

Hi @bigrobinson, I need to update the docs for NousHermes, which uses multi-turn templates for the tool calls and responses. If you are using plugins, you can use that by connecting a plugin that exposes tools to the last output channel of the NanoLLM plugin:

https://github.com/dusty-nv/NanoLLM/blob/3a28b7ce78f95519a8020502817883862ca0c360/nano_llm/plugins/llm/nano_llm.py#L31

You can also mimic how that plugin calls chat_history.run_tools() to make your own generation script instead.

dusty-nv avatar Aug 21 '24 05:08 dusty-nv