mlc-llm
mlc-llm copied to clipboard
[Bug] Cannot find env function `vm.builtin.memory_manager.clear`
🐛 Bug
mlc_chat_cli --model dolly-v2-3b
Use MLC config: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/params/mlc-chat-config.json"
Use model weights: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/params/ndarray-cache.json"
Use model library: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/dolly-v2-3b-q3f16_0-cuda.so"
...
Loading model...
[01:33:21] /mnt/f/mlc-llm/cpp/llm_chat.cc:884:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
Check failed: (fclear_memory_manager) is false: Cannot find env function vm.builtin.memory_manager.clear
Stack trace:
[bt] (0) /usr/local/lib/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x2c) [0x7f43b814346c]
[bt] (1) mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x3b) [0x562404bf8cbb]
[bt] (2) /usr/local/lib/libmlc_llm.so(+0x1a6526) [0x7f43b843f526]
[bt] (3) /usr/local/lib/libmlc_llm.so(+0x1c2334) [0x7f43b845b334]
[bt] (4) mlc_chat_cli(+0x15079) [0x562404bfc079]
[bt] (5) mlc_chat_cli(+0xec76) [0x562404bf5c76]
[bt] (6) mlc_chat_cli(+0x9a20) [0x562404bf0a20]
[bt] (7) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f43b7c47d90]
[bt] (8) /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f43b7c47e40]
To Reproduce
Steps to reproduce the behavior:
mlc_chat_cli --model dolly-v2-3b
Expected behavior
Environment
- Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
- Operating system (e.g. Ubuntu/Windows/MacOS/...): WSL-Ubuntu
- Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) 3070ti
- How you installed MLC-LLM (
conda, source): source - How you installed TVM-Unity (
pip, source): source - Python version (e.g. 3.10): 3.10
- GPU driver version (if applicable): 525
- CUDA/cuDNN version (if applicable): 11.8
- TVM Unity Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): - Any other relevant information:
Additional context
The 3rdparty/tvm branch is unity, commit id: 0328e98b940adb970185ce97f015e924b640ba09
When removed call ClearGlobalMemoryManager and rebuild, the cli it working and keeping output reponse text long time and not quit.
PackedFunc GetFunction(const String& name, const ObjectPtr<Object>& sptr_to_self) final {
if (name == "reload") {
return PackedFunc([this, sptr_to_self](TVMArgs args, TVMRetValue* rv) {
chat_ = nullptr;
// ClearGlobalMemoryManager();
chat_ = std::make_unique<LLMChat>(LLMChat(device_));
....
$ mlc_chat_cli --model dolly-v2-3b
Use MLC config: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/params/mlc-chat-config.json"
Use model weights: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/params/ndarray-cache.json"
Use model library: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/dolly-v2-3b-q3f16_0-cuda.so"
...
/reload [local_id] reload model `local_id` from disk, or reload the current model if `local_id` is not specified
Loading model...
Loading finished
Running system prompts...
System prompts finished
### Instruction: who are you
### Response: The end for itself
If you are not a human being
The end of itself
The beginning of itself
The end for itself
The end of itself
The end for the
The end of itself
The end of itself
The end of itself
The end of
The end of
The end of
The end of
The end of
The end of
The end of