mlc-llm
mlc-llm copied to clipboard
Docker support
Please, consider adding DockerFIle and docker-compose to the repository
I gave it a quick try the Dockerfile below with no luck, sharing in case someone is more knowledgeable and knows how to move forward (tested on a apple m1 pro, os 12.6.5)
full build log at https://gist.github.com/felipesabino/9a97b62d35e6c6965093bd8410a83390
# https://github.com/mlc-ai/mlc-llm/issues/33
# Use the x86_64 Miniconda3 base image
FROM --platform=linux/amd64 continuumio/miniconda3
# Set the working directory
WORKDIR /app
# Create and activate the conda environment
RUN conda create -n mlc-chat
SHELL ["conda", "run", "-n", "mlc-chat", "/bin/bash", "-c"]
# Add conda-forge channel and install Git, Git-LFS, and mlc-chat-nightly
RUN conda config --append channels conda-forge && \
conda install -y git git-lfs && \
conda install -y -c mlc-ai mlc-chat-nightly
# Clone the repositories and set up the model
RUN mkdir -p dist && \
git lfs install && \
git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b && \
git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/lib
# Set the entrypoint for running the mlc_chat_cli
ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "mlc-chat", "mlc_chat_cli"]
Executing
$ docker build -t mlc-chat-cli .
[+] Building 499.7s (10/10) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 37B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
.....
$ docker run -it --rm mlc-chat-cli
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
terminate called after throwing an instance of 'tvm::runtime::InternalError'
what(): [16:41:06] /home/runner/work/utils/utils/tvm/src/runtime/vulkan/vulkan_instance.cc:111:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
Check failed: (__e == VK_SUCCESS) is false: Vulkan Error, code=-9: VK_ERROR_INCOMPATIBLE_DRIVER
Stack trace:
[bt] (0) /opt/conda/envs/mlc-chat/bin/../lib/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x27) [0x400204ac67]
[bt] (1) /opt/conda/envs/mlc-chat/bin/../lib/libtvm_runtime.so(+0x3f375) [0x4001fe8375]
[bt] (2) /opt/conda/envs/mlc-chat/bin/../lib/libtvm_runtime.so(tvm::runtime::vulkan::VulkanInstance::VulkanInstance()+0x1a47) [0x40021395b7]
[bt] (3) /opt/conda/envs/mlc-chat/bin/../lib/libtvm_runtime.so(tvm::runtime::vulkan::VulkanDeviceAPI::VulkanDeviceAPI()+0x40) [0x40021357c0]
[bt] (4) /opt/conda/envs/mlc-chat/bin/../lib/libtvm_runtime.so(tvm::runtime::vulkan::VulkanDeviceAPI::Global()+0x4c) [0x4002135b5c]
[bt] (5) /opt/conda/envs/mlc-chat/bin/../lib/libtvm_runtime.so(+0x18cb9d) [0x4002135b9d]
[bt] (6) /opt/conda/envs/mlc-chat/bin/../lib/libtvm_runtime.so(+0x6bff4) [0x4002014ff4]
[bt] (7) /opt/conda/envs/mlc-chat/bin/../lib/libtvm_runtime.so(+0x6c597) [0x4002015597]
[bt] (8) mlc_chat_cli(+0xe4f0) [0x400000e4f0]
qemu: uncaught target signal 6 (Aborted) - core dumped
/tmp/tmpy6jzslz9: line 3: 29 Aborted mlc_chat_cli
ERROR conda.cli.main_run:execute(47): `conda run mlc_chat_cli` failed. (See above for error)
```
Hi. I tried with your Dockerfile and I think there is missing this step: On Windows and Linux, the chatbot application runs on GPU via the Vulkan platform. For Windows and Linux users, please install the latest Vulkan driver. For NVIDIA GPU users, please make sure to install Vulkan driver, as the CUDA driver may not be good. I only guess it from what I see in the instructions for MLC LLM and error it generate when run the container. I am new on Docker and MLC LLM but would like to give it a try also. If you found the solution how to manage the problem I will appreciate if you post it here. Thanks and good luck.
Update: I opened up a repo (https://github.com/junrushao/llm-perf-bench) of dockerfiles to help reproduce cuda performance numbers. The takeaway is: MLC LLM is around 30% faster than Exllama.
I’m not a docker expert and still think docker isn’t always the best way for MLC LLM to demonstrate universal deployment as some drivers may not available in it (Metal/Vulkan), but it does help for cuda cases.
Hey @boylucky and @felipesabino, were you guys able to get it to work after downloading the vulkan drivers?
I'm using mac m1 pro and also getting InternalError: Check failed: (__e == VK_SUCCESS) is false: Vulkan Error, code=-9: VK_ERROR_INCOMPATIBLE_DRIVER
Is there a way to disable vulkan in tvm?
@arif599 docker on macOS uses qemu to run a linux, then run docker, and I don't think the linux virtual machine has GPU, so you probably cannot use docker-based mlc on macOS.