LocalAI Is there a template configuration that supports Llama3-ChatQA-1.5-70B?

use this config can't answer the question

name: llama3-70b-chatQA
mmap: true
context_size: 8192
#threads: 11
#gpu_layers: 90
f16: true
parameters:
  model: Llama3-ChatQA-1.5-70B-Q4_K_M.gguf
function:
  # set to true to allow the model to call multiple functions in parallel
  parallel_calls: true
template:
  chat_message: |
    <|start_header_id|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}<|end_header_id|>

    {{ if .FunctionCall -}}
    Function call:
    {{ else if eq .RoleName "tool" -}}
    Function response:
    {{ end -}}
    {{ if .Content -}}
    {{.Content -}}
    {{ else if .FunctionCall -}}
    {{ toJson .FunctionCall -}}
    {{ end -}}
    <|eot_id|>
  function: |
    <|start_header_id|>system<|end_header_id|>

    You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
    <tools>
    {{range .Functions}}
    {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
    {{end}}
    </tools>
    Use the following pydantic model json schema for each tool call you will make:
    {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    Function call:
  chat: |
    <|begin_of_text|>{{.Input }}
    <|start_header_id|>assistant<|end_header_id|>
  completion: |
    {{.Input}}
stopwords:
- <|im_end|>
- <dummy32000>
- <|eot_id|>
- <|end_of_text|>
usage: |
      curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
          "model": "llama3-70b-chatQA",
          "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
      }'

May 11 '24 06:05 WuQic

Hi @WuQic I tested this model in transformer backend with the OpenVINO version. Was not particularly impressed, If you want to give a try this is the model definition.

name: ChatQA
backend: transformers
parameters:
  model: fakezeta/Llama3-ChatQA-1.5-8B-ov-int8
context_size: 8192
type: OVModelForCausalLM
template:
  use_tokenizer_template: true
stopwords:
- "<|eot_id|>"
- "<|end_of_text|>"

The template is in the tokenizer_config.json file coming from Nvidia.

May 11 '24 16:05 fakezeta

I tested the 8b model, not recommended in my opinion.

May 24 '24 16:05 thiner