mlc-llm
mlc-llm copied to clipboard
[Bug] Cannot compile custom wasm model file to work on web browser
🐛 Bug
Hello, I am trying to convert my own model type based on llama2 to compille to work on wasm, but I cannot make it to work whatever I do.
I followed the instructions here (https://llm.mlc.ai/docs/compilation/compile_models.html).
To Reproduce
Steps to reproduce the behavior:
This is roughly the script I used to compile.
# Wasm prerequisites
emcc -v # emsdk is installed elsewhere, it works
git clone https://github.com/mlc-ai/mlc-llm.git --recursive
cd mlc-llm
./web/prep_emcc_deps.sh
# Conda env
conda create -n mlc python=3.11 -y
conda activate mlc
python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cu122 mlc-ai-nightly-cu122
# Convert and compile
mlc_llm convert_weight ~/models/test_model --quantization q4f16_1 -o ~/models/mlc_wasm/ --device cuda
mlc_llm gen_config ~/models/test_model \
--quantization q4f16_1 --conv-template llama_default \
-o ~/models/mlc_wasm
mlc_llm compile ~/models/mlc_wasm/mlc-chat-config.json \
--device webgpu -o ~/models/mlc_wasm/model.wasm
There is no error, everything works fine. After all is done, the output directory looks as following.
ls -alF ~/models/mlc_wasm
-rw-r--r-- 1 root root 2.7K Apr 19 15:14 added_tokens.json
-rw-r--r-- 1 root root 619K Apr 19 15:14 merges.txt
-rw-r--r-- 1 root root 1.1K Apr 19 15:29 mlc-chat-config.json
-rwxr-xr-x 1 root root 4.1M Apr 19 15:16 model.wasm
-rw-r--r-- 1 root root 98K Apr 19 15:13 ndarray-cache.json
-rw-r--r-- 1 root root 48M Apr 19 15:13 params_shard_0.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_10.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_11.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_12.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_13.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_14.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_15.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_16.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_17.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_18.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_19.bin
-rw-r--r-- 1 root root 48M Apr 19 15:13 params_shard_1.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_20.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_21.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_22.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_23.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_24.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_25.bin
-rw-r--r-- 1 root root 9.1M Apr 19 15:13 params_shard_26.bin
-rw-r--r-- 1 root root 31M Apr 19 15:13 params_shard_2.bin
-rw-r--r-- 1 root root 31M Apr 19 15:13 params_shard_3.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_4.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_5.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_6.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_7.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_8.bin
-rw-r--r-- 1 root root 30M Apr 19 15:13 params_shard_9.bin
-rw-r--r-- 1 root root 35K Apr 19 15:14 tokenizer_config.json
-rw-r--r-- 1 root root 2.4M Apr 19 15:14 tokenizer.json
-rw-r--r-- 1 root root 944K Apr 19 15:14 vocab.json
Then I launch a simple static serving http server to serde this directory, and I modified https://github.com/mlc-ai/web-llm/tree/main/examples/simple-chat example to look into that server, and launched my model.
import { prebuiltAppConfig } from "@mlc-ai/web-llm";
const my_models = [{
model_id: "mlc_test",
model_url: "http://localhost:8080",
model_lib_url: "http://localhost:8080/model.wasm"
}];
const new_model_list = my_models.concat(prebuiltAppConfig.model_list);
export default {
"model_list": new_model_list,
"use_web_worker": true
}
This is where the error comes, and the error looks like this.
It seems something's not done right when compiling the model to wasm, but I just couldn't get it to work. I tried many things including compile tvm & mlc-llm myself, but nothing worked.
Expected behavior
Environment
- Platform (e.g. WebGPU):
- Operating system (e.g. Ubuntu 20.04):
- Device (A100)
- How you installed MLC-LLM (
conda
): - How you installed TVM-Unity (
pip
): - Python version (e.g. 3.11):
- GPU driver version (Driver Version: 545.23.08):
- CUDA/cuDNN version (CUDA Version: 12.3):
- TVM Unity Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU:
CUDA_VERSION: 12.2
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: ON
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --ignore-libllvm --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: ON
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: d694451c580a931116a2c93571f21f7d791c7fa0
USE_VULKAN: ON
USE_RUST_EXT: OFF
USE_CUTLASS: ON
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-04-18 10:05:07 -0400
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: ON
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_FLASHINFER: ON
USE_CUBLAS: ON
USE_METAL: OFF
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /opt/rh/gcc-toolset-11/root/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
- Any other relevant information:
Additional context
cc @CharlieFRuan
Hi @skyser2003! Apologies for the inconvenience. This should be fixed now via https://github.com/mlc-ai/mlc-llm/pull/2187. Try the npm 0.2.34 with the newly compiled model.
@CharlieFRuan That's great, thanks.