mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Unknown conversation template: dolly

Open sleepwalker2017 opened this issue 2 years ago • 8 comments

Hi all, I take a lot of effort to run this demo, but it crashes with this error, could anyone give some support ???

./build/mlc_chat_cli --model dolly-v2-3b
Use MLC config: "dist/dolly-v2-3b-q3f16_0/params/mlc-chat-config.json"
Use model weights: "dist/dolly-v2-3b-q3f16_0/params/ndarray-cache.json"
Use model library: "dist/dolly-v2-3b-q3f16_0/dolly-v2-3b-q3f16_0-cuda.so"
Loading model...
You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out the latest stats (token/sec)
  /reset              restart a fresh chat
  /reload [local_id]  reload model `local_id` from disk, or reload the current model if `local_id` is not specified

From Template, name is dolly
[09:51:17] /root/codes/mlc-2/mlc-llm/cpp/conv_templates.cc:121: Unknown conversation template: dolly
Stack trace:
  [bt] (0) /root/codes/mlc-2/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x35) [0x7f895877083d]
  [bt] (1) ./build/mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x3e) [0x560517171038]
  [bt] (2) ./build/mlc_chat_cli(tvm::runtime::detail::LogFatal::stream[abi:cxx11]()+0) [0x560517170db6]
  [bt] (3) /root/codes/mlc-2/mlc-llm/build/libmlc_llm.so(mlc::llm::Conversation::FromTemplate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x2d7) [0x7f89590c4cd3]
  [bt] (4) /root/codes/mlc-2/mlc-llm/build/libmlc_llm.so(mlc::llm::LLMChat::Reload(tvm::runtime::Module, tvm::runtime::String)+0x1d11) [0x7f89590d90e1]
  [bt] (5) /root/codes/mlc-2/mlc-llm/build/libmlc_llm.so(mlc::llm::LLMChatModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x2a6) [0x7f89590decb4]
  [bt] (6) /root/codes/mlc-2/mlc-llm/build/libmlc_llm.so(tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<mlc::llm::LLMChatModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x38) [0x7f89590fc96a]
  [bt] (7) ./build/mlc_chat_cli(+0x33fc2) [0x56051717efc2]
  [bt] (8) ./build/mlc_chat_cli(+0x2f11d) [0x56051717a11d]

sleepwalker2017 avatar May 29 '23 09:05 sleepwalker2017

after I modify the json file, it runs . But when I ask a question, it runs with this , all wrong????

image

sleepwalker2017 avatar May 29 '23 10:05 sleepwalker2017

It doesn't work well . image

sleepwalker2017 avatar May 29 '23 10:05 sleepwalker2017

cat dist/dolly-v2-3b-q3f16_0/params/mlc-chat-config.json
{
    "model_lib": "dolly-v2-3b-q3f16_0",
    "local_id": "dolly-v2-3b-q3f16_0",
    "conv_template": "vicuna_v1.1",
    "temperature": 0.7,
    "repetition_penalty": 1.0,
    "top_p": 0.95,
    "mean_gen_len": 128,
    "shift_fill_factor": 0.3,
    "tokenizer_files": [
        "tokenizer.json"
    ]
}

sleepwalker2017 avatar May 29 '23 10:05 sleepwalker2017

Gu @sleepwalker2017 , the Dolly with 3-bit quantization doesn't work well, you can try compiling with q4f16_0 instead which should produce output with better quality.

yzh119 avatar May 29 '23 21:05 yzh119

q4f16_0 thank you! I'll try it!!!

sleepwalker2017 avatar May 30 '23 02:05 sleepwalker2017

I modify the json file, dist/dolly-v2-3b-q4f16_0/params/mlc-chat-config.json,

the default template is dolly, I change it to vicuna_v1.1, Is that ok?? because it complains "Unknown conversation template: dolly" if I use the original json file.

@yzh119

sleepwalker2017 avatar May 30 '23 02:05 sleepwalker2017

Different models are trained on different conversation formats, so you shouldn't use a conversation template for Vicuna. Below is my suggested modification on mlc-chat-config.json:

{
    "model_lib": "dolly-v2-3b-q3f16_0",
    "local_id": "dolly-v2-3b-q3f16_0",
    "conv_config": {
        "system": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n",
        "roles": ["### Instruction", "### Response"],
        "messages": [],
        "offset": 0,
        "seps": ["\n\n", "### End"],
        "stop_tokens": [2],
        "separator_style": 0,
        "name": "dolly-v2"
    }
    "temperature": 0.7,
    "repetition_penalty": 1.0,
    "top_p": 0.95,
    "mean_gen_len": 128,
    "shift_fill_factor": 0.3,
    "tokenizer_files": [
        "tokenizer.json"
    ]
}

I haven't got time to verify this by myself yet but I will do it, you can try tweaking it if my config doesn't work, some conversation reference can be found at https://github.com/mlc-ai/mlc-llm/blob/ce76c34bfd28babbf3a60dd46e15e0abbe10188e/cpp/conv_templates.cc .

We will write a tutorial on how to customize the conversation template soon.

yzh119 avatar May 30 '23 09:05 yzh119

@sleepwalker2017 , the issue should have been fixed in #341

yzh119 avatar Jun 07 '23 05:06 yzh119

Should be fixed on HEAD

junrushao avatar Jun 14 '23 05:06 junrushao