mlc-llm
mlc-llm copied to clipboard
Unknown conversation template: dolly
Hi all, I take a lot of effort to run this demo, but it crashes with this error, could anyone give some support ???
./build/mlc_chat_cli --model dolly-v2-3b
Use MLC config: "dist/dolly-v2-3b-q3f16_0/params/mlc-chat-config.json"
Use model weights: "dist/dolly-v2-3b-q3f16_0/params/ndarray-cache.json"
Use model library: "dist/dolly-v2-3b-q3f16_0/dolly-v2-3b-q3f16_0-cuda.so"
Loading model...
You can use the following special commands:
/help print the special commands
/exit quit the cli
/stats print out the latest stats (token/sec)
/reset restart a fresh chat
/reload [local_id] reload model `local_id` from disk, or reload the current model if `local_id` is not specified
From Template, name is dolly
[09:51:17] /root/codes/mlc-2/mlc-llm/cpp/conv_templates.cc:121: Unknown conversation template: dolly
Stack trace:
[bt] (0) /root/codes/mlc-2/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x35) [0x7f895877083d]
[bt] (1) ./build/mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x3e) [0x560517171038]
[bt] (2) ./build/mlc_chat_cli(tvm::runtime::detail::LogFatal::stream[abi:cxx11]()+0) [0x560517170db6]
[bt] (3) /root/codes/mlc-2/mlc-llm/build/libmlc_llm.so(mlc::llm::Conversation::FromTemplate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x2d7) [0x7f89590c4cd3]
[bt] (4) /root/codes/mlc-2/mlc-llm/build/libmlc_llm.so(mlc::llm::LLMChat::Reload(tvm::runtime::Module, tvm::runtime::String)+0x1d11) [0x7f89590d90e1]
[bt] (5) /root/codes/mlc-2/mlc-llm/build/libmlc_llm.so(mlc::llm::LLMChatModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x2a6) [0x7f89590decb4]
[bt] (6) /root/codes/mlc-2/mlc-llm/build/libmlc_llm.so(tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<mlc::llm::LLMChatModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x38) [0x7f89590fc96a]
[bt] (7) ./build/mlc_chat_cli(+0x33fc2) [0x56051717efc2]
[bt] (8) ./build/mlc_chat_cli(+0x2f11d) [0x56051717a11d]
after I modify the json file, it runs . But when I ask a question, it runs with this , all wrong????
It doesn't work well .
cat dist/dolly-v2-3b-q3f16_0/params/mlc-chat-config.json
{
"model_lib": "dolly-v2-3b-q3f16_0",
"local_id": "dolly-v2-3b-q3f16_0",
"conv_template": "vicuna_v1.1",
"temperature": 0.7,
"repetition_penalty": 1.0,
"top_p": 0.95,
"mean_gen_len": 128,
"shift_fill_factor": 0.3,
"tokenizer_files": [
"tokenizer.json"
]
}
Gu @sleepwalker2017 , the Dolly with 3-bit quantization doesn't work well, you can try compiling with q4f16_0 instead which should produce output with better quality.
q4f16_0 thank you! I'll try it!!!
I modify the json file, dist/dolly-v2-3b-q4f16_0/params/mlc-chat-config.json,
the default template is dolly, I change it to vicuna_v1.1, Is that ok?? because it complains "Unknown conversation template: dolly" if I use the original json file.
@yzh119
Different models are trained on different conversation formats, so you shouldn't use a conversation template for Vicuna. Below is my suggested modification on mlc-chat-config.json:
{
"model_lib": "dolly-v2-3b-q3f16_0",
"local_id": "dolly-v2-3b-q3f16_0",
"conv_config": {
"system": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n",
"roles": ["### Instruction", "### Response"],
"messages": [],
"offset": 0,
"seps": ["\n\n", "### End"],
"stop_tokens": [2],
"separator_style": 0,
"name": "dolly-v2"
}
"temperature": 0.7,
"repetition_penalty": 1.0,
"top_p": 0.95,
"mean_gen_len": 128,
"shift_fill_factor": 0.3,
"tokenizer_files": [
"tokenizer.json"
]
}
I haven't got time to verify this by myself yet but I will do it, you can try tweaking it if my config doesn't work, some conversation reference can be found at https://github.com/mlc-ai/mlc-llm/blob/ce76c34bfd28babbf3a60dd46e15e0abbe10188e/cpp/conv_templates.cc .
We will write a tutorial on how to customize the conversation template soon.
@sleepwalker2017 , the issue should have been fixed in #341
Should be fixed on HEAD