[Question] Running mlc_llm into a multi-phase container build
❓ General Questions
I'm trying to build a containerized application with vicuna-7b and mlc-llm for Jetsons with JP6. This is my multi-phase Containerfile:
FROM docker.io/dustynv/mlc:r36.2.0 AS builder
# Install and enable git-lfs
RUN apt-get update && apt-get install git-lfs
RUN git lfs install
# Clone local copy of Vicuna-7b-v1.5
RUN mkdir /opt/models
RUN cd /opt/models && git clone https://huggingface.co/lmsys/vicuna-7b-v1.5
# Install grpcio-tools and compile protobuf protocols
RUN pip install grpcio-tools
COPY ./protobuf /protobuf
RUN cd /protobuf/ && python3 -m grpc_tools.protoc -I./ --python_out=. --pyi_out=. --grpc_python_out=. ./vicunaserving.proto
# Compile vicuna-7b-v1.5
RUN python3 -m mlc_llm.build \
--model vicuna-7b-v1.5 \
--quantization q4f16_ft \
--artifact-path /opt/ \
--max-seq-len 4096 \
--target cuda
### End builder image build ###
# Start main image build
FROM nvcr.io/nvidia/l4t-base:r36.2.0
# Copy vicunaserver, compiled model, and whl files for pip install
COPY ./vicunaserver/ /opt/vicunaserver/
COPY --from=builder /opt/vicuna-7b-v1.5-q4f16_ft/ /opt/vicunaserver/
COPY --from=builder /opt/mlc_llm-0.1.dev930+g607dc5a-py3-none-any.whl /tmp
COPY --from=builder /opt/torch-2.1.0-cp310-cp310-linux_aarch64.whl /tmp
COPY --from=builder /opt/torchvision-0.16.0+fbb4cc5-cp310-cp310-linux_aarch64.whl /tmp
COPY --from=builder /opt/mlc_chat-0.1.dev930+g607dc5a-cp310-cp310-linux_aarch64.whl /tmp
COPY --from=builder /opt/tvm-0.15.dev48+g59c355604-cp310-cp310-linux_aarch64.whl /tmp
# Install dependencies with apt/pip
RUN apt update && apt install -y python3-pip
RUN python3.10 -m pip install /tmp/*.whl
# Install essential CUDA packages
RUN apt-cache search cuda-*
RUN apt install -y --no-install-recommends --no-install-suggests cuda-minimal-build-12-2 cuda-nvrtc-12-2 libcudnn8 libcublas-12-2 libcurand-12-2
# Copy protobuf defs
COPY --from=builder /protobuf/ /opt/vicunaserver
# Set workdir to where our server is
WORKDIR /opt/vicunaserver/
# Set default CMD to run our server
CMD python3 vicunaserver.py
When I run the container, I get the following error:
root@402c73bca1f5:/opt/vicunaserver# python3 vicunaserver.py
Traceback (most recent call last):
File "/opt/vicunaserver/vicunaserver.py", line 1, in <module>
from mlc_chat import ChatModule, ChatConfig, ConvConfig
File "/usr/local/lib/python3.10/dist-packages/mlc_chat/__init__.py", line 5, in <module>
from . import protocol, serve
File "/usr/local/lib/python3.10/dist-packages/mlc_chat/serve/__init__.py", line 4, in <module>
from .. import base
File "/usr/local/lib/python3.10/dist-packages/mlc_chat/base.py", line 6, in <module>
import tvm
File "/usr/local/lib/python3.10/dist-packages/tvm/__init__.py", line 26, in <module>
from ._ffi.base import TVMError, __version__, _RUNTIME_ONLY
File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/__init__.py", line 28, in <module>
from .base import register_error
File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 78, in <module>
_LIB, _LIB_NAME = _load_lib()
File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 64, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libfpA_intB_gemm.so: cannot open shared object file: No such file or directory
I see that library is located in:
root@402c73bca1f5:/opt/vicunaserver# ll /usr/local/lib/python3.10/dist-packages/tvm/libfpA_intB_gemm.so
-rwxr-xr-x. 1 root root 190683216 Jun 5 09:04 /usr/local/lib/python3.10/dist-packages/tvm/libfpA_intB_gemm.so*
So If I add export LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/tvm/:$LD_LIBRARY_PATH, my vicuna server starts:
root@402c73bca1f5:/opt/vicunaserver# python3 vicunaserver.py
[2024-06-05 09:38:00] INFO vicunaserver.py:51: Starting server on [::]:50051
However, when I make use of it (the demo is basically a webserver, yolov8 and this vicuna server) I get an error which seems about some dependency mismatch:
root@vicunaserver-5ff46766b9-mznrs:/opt/vicunaserver# python3 vicunaserver.py
[2024-06-05 10:19:34] INFO vicunaserver.py:51: Starting server on [::]:50051
[2024-06-05 10:19:45] INFO vicunaserver.py:15: Serving the requested inferencing
[2024-06-05 10:19:47] INFO auto_device.py:76: Found device: cuda:0
[2024-06-05 10:19:49] INFO auto_device.py:85: Not found device: rocm:0
[2024-06-05 10:19:51] INFO auto_device.py:85: Not found device: metal:0
[2024-06-05 10:19:53] INFO auto_device.py:85: Not found device: vulkan:0
[2024-06-05 10:19:55] INFO auto_device.py:85: Not found device: opencl:0
[2024-06-05 10:19:55] INFO auto_device.py:33: Using device: cuda:0
[2024-06-05 10:19:55] INFO chat_module.py:373: Using model folder: /opt/vicunaserver/params
[2024-06-05 10:19:55] INFO chat_module.py:374: Using mlc chat config: /opt/vicunaserver/params/mlc-chat-config.json
[2024-06-05 10:19:55] INFO chat_module.py:560: Using library model: /opt/vicunaserver/vicuna-7b-v1.5-q4f16_ft-cuda.so
[2024-06-05 10:19:57] ERROR model_metadata.py:162: FAILED to read metadata section in legacy model lib.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/mlc_chat/cli/model_metadata.py", line 160, in main
metadata = _extract_metadata(parsed.model_lib)
File "/usr/local/lib/python3.10/dist-packages/mlc_chat/cli/model_metadata.py", line 26, in _extract_metadata
return json.loads(VirtualMachine(load_module(model_lib), device("cpu"))["_metadata"]())
File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/relax_vm.py", line 136, in __getitem__
return self.module[key]
File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/module.py", line 192, in __getitem__
return self.get_function(name)
File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/module.py", line 176, in get_function
raise AttributeError(f"Module has no function '{name}'")
AttributeError: Module has no function '_metadata'
Any clue?
It seems to be complaining about this function: https://GitHub.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/cli/model_metadata.py#L20
BTW, I tried using dustynv's as base image instead of the l4t-base one, and I'm getting the same results :-(
hi @oglok seems you were using an older version that is now being deprecated
Hey @tqchen , is there a container image I can use ?
I see dockerfiles in the mlc-ai/packages repo, and mlc-ai/env repo, but nothing ready to use...
as of now unfortunately we don't have a container file unfortunately so maybe build from source for jetson is needed
@tqchen I realize there is no wheel package with cuda for ARM devices. Am I the only one interested in running this on a Jetson?
@oglok jetson-containers builds MLC from source, there are some patches I apply (mostly to MLC/TVM 3rd-party submodules) so it's not on the latest, however there is version using mlc_chat builder (what I have tagged as version 0.1.1)
Also IIRC AttributeError: Module has no function '_metadata' is just a warning and the program typically continues on without issue after that. Does it halt for you or you have some other problem?
Actually you are right @dusty-nv . The container does not halt, but I thought it wasn't behaving properly. But apparently it is.
What's the problem with generating wheel packages for cuda on arm? or for jetson?
@oglok the MLC/TVM wheels that jetson-containers builds are here: http://jetson.webredirect.org/jp6/cu122
It is a non-trivial build process with all the bells & whistles enabled, and as you have found there are extra dependencies and files you need to install.
@oglok the MLC/TVM wheels that jetson-containers builds are here: http://jetson.webredirect.org/jp6/cu122
It is a non-trivial build process with all the bells & whistles enabled, and as you have found there are extra dependencies and files you need to install.
❤️