mlc-llm
mlc-llm copied to clipboard
It Just Seemingly Crashes....?
🐛 Bug
When ever i run it using mlc_chat_cli --local-id vicuna-v1-7b-q3f16_0, it just stops loading the model into memory with this error : [12:31:23] C:\Miniconda\envs\mlc-llm-build\conda-bld\mlc-chat-nightly-package_1686373785773\work\3rdparty\tvm\src\runtime\vulkan\vulkan_buffer.cc:61:
An error occurred during the execution of TVM. For more information, please see: https://tvm.apache.org/docs/errors.html
Check failed: (__e == VK_SUCCESS) is false: Vulkan Error, code=-2: VK_ERROR_OUT_OF_DEVICE_MEMORY Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time.
Well It Says Out Of Device Memory, but i took a look at the vram usage and it never uses it all it goes near 2.9 then this error occurs, and i am using the integrated gpu for rendering normal stuff, so its not that either.
To Reproduce
Steps to reproduce the behavior:
1.mlc_chat_cli --local-id vicuna-v1-7b-q3f16_0
Environment
- Platform (e.g. Vulkan):
- Operating system (Windows 11):
- Device (GTX 1650)
- How you installed MLC-LLM (
conda, source): YEs - How you installed TVM-Unity (
pip, source):....I dont think so - CUDA/cuDNN version (12.1):
- Any other relevant information:
Additional context
Please try RedPajama-INCITE-Chat-3B-v1, which is more friendly to 4G VRAM
4G VRAM is not enough for Vicuna-7b.