[Bug] Cannot find env function `vm.builtin.memory_manager.clear`

Open Ox0400 opened this issue 2 years ago • 0 comments

🐛 Bug

mlc_chat_cli --model dolly-v2-3b
Use MLC config: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/params/mlc-chat-config.json"
Use model weights: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/params/ndarray-cache.json"
Use model library: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/dolly-v2-3b-q3f16_0-cuda.so"
...

Loading model...
[01:33:21] /mnt/f/mlc-llm/cpp/llm_chat.cc:884:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (fclear_memory_manager) is false: Cannot find env function vm.builtin.memory_manager.clear
Stack trace:
  [bt] (0) /usr/local/lib/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x2c) [0x7f43b814346c]
  [bt] (1) mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x3b) [0x562404bf8cbb]
  [bt] (2) /usr/local/lib/libmlc_llm.so(+0x1a6526) [0x7f43b843f526]
  [bt] (3) /usr/local/lib/libmlc_llm.so(+0x1c2334) [0x7f43b845b334]
  [bt] (4) mlc_chat_cli(+0x15079) [0x562404bfc079]
  [bt] (5) mlc_chat_cli(+0xec76) [0x562404bf5c76]
  [bt] (6) mlc_chat_cli(+0x9a20) [0x562404bf0a20]
  [bt] (7) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f43b7c47d90]
  [bt] (8) /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f43b7c47e40]

To Reproduce

Steps to reproduce the behavior:

mlc_chat_cli --model dolly-v2-3b

Expected behavior

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
Operating system (e.g. Ubuntu/Windows/MacOS/...): WSL-Ubuntu
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) 3070ti
How you installed MLC-LLM (conda, source): source
How you installed TVM-Unity (pip, source): source
Python version (e.g. 3.10): 3.10
GPU driver version (if applicable): 525
CUDA/cuDNN version (if applicable): 11.8
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
Any other relevant information:

Additional context

The 3rdparty/tvm branch is unity, commit id: 0328e98b940adb970185ce97f015e924b640ba09

When removed call ClearGlobalMemoryManager and rebuild, the cli it working and keeping output reponse text long time and not quit.

  PackedFunc GetFunction(const String& name, const ObjectPtr<Object>& sptr_to_self) final {
    if (name == "reload") {
      return PackedFunc([this, sptr_to_self](TVMArgs args, TVMRetValue* rv) {
        chat_ = nullptr;
        // ClearGlobalMemoryManager();
        chat_ = std::make_unique<LLMChat>(LLMChat(device_));
        ....

$ mlc_chat_cli --model dolly-v2-3b
Use MLC config: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/params/mlc-chat-config.json"
Use model weights: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/params/ndarray-cache.json"
Use model library: "/mnt/f/mlc-llm/dist/dolly-v2-3b-q3f16_0/dolly-v2-3b-q3f16_0-cuda.so"
...
  /reload [local_id]  reload model `local_id` from disk, or reload the current model if `local_id` is not specified

Loading model...
Loading finished
Running system prompts...
System prompts finished
### Instruction: who are you
### Response: The end for itself


If you are not a human being

The end of itself

The beginning of itself

The end for itself

The end of itself

The end for the

The end of itself

The end of itself

The end of itself

The end of

The end of

The end of

The end of

The end of

The end of

The end of

Jun 09 '23 17:06 Ox0400