mlc-llm It Just Seemingly Crashes....?

🐛 Bug

When ever i run it using mlc_chat_cli --local-id vicuna-v1-7b-q3f16_0, it just stops loading the model into memory with this error : [12:31:23] C:\Miniconda\envs\mlc-llm-build\conda-bld\mlc-chat-nightly-package_1686373785773\work\3rdparty\tvm\src\runtime\vulkan\vulkan_buffer.cc:61:

An error occurred during the execution of TVM. For more information, please see: https://tvm.apache.org/docs/errors.html

Check failed: (__e == VK_SUCCESS) is false: Vulkan Error, code=-2: VK_ERROR_OUT_OF_DEVICE_MEMORY Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time.

Well It Says Out Of Device Memory, but i took a look at the vram usage and it never uses it all it goes near 2.9 then this error occurs, and i am using the integrated gpu for rendering normal stuff, so its not that either.

To Reproduce

Steps to reproduce the behavior:

1.mlc_chat_cli --local-id vicuna-v1-7b-q3f16_0

Environment

Platform (e.g. Vulkan):
Operating system (Windows 11):
Device (GTX 1650)
How you installed MLC-LLM (conda, source): YEs
How you installed TVM-Unity (pip, source):....I dont think so
CUDA/cuDNN version (12.1):
Any other relevant information:

Additional context

Jun 10 '23 07:06 RiyanParvez

Please try RedPajama-INCITE-Chat-3B-v1, which is more friendly to 4G VRAM

Jun 12 '23 08:06 Hzfengsy

4G VRAM is not enough for Vicuna-7b.

Jun 12 '23 15:06 junrushao