mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Bug] Yi model error:TVM runtime cannot find vm_load_executable

Open MrRace opened this issue 1 year ago • 7 comments
trafficstars

🐛 Bug

When deploying the Yi-6B-Chat model to an Android phone, an error occurred:

Error message:
InternalError: Check failed: (fload_exec.defined()) is false: TVM runtime cannot find vm_load_executable
Stack trace:
  File "/share/home/Repository/mlc-llm/cpp/llm_chat.cc", line 163.

To Reproduce

Steps to reproduce the behavior:

MODEL_NAME=Yi-6B-Chat
QUANTIZATION=q4f16_1
  1. convert weights
mlc_chat convert_weight /share/home/model_zoo/LLM/01-ai/$MODEL_NAME/ --quantization $QUANTIZATION -o dist/$MODEL_NAME-$QUANTIZATION-MLC/

mlc_chat gen_config /share/jiepeng.liu/model_zoo/LLM/01-ai/$MODEL_NAME/ --quantization $QUANTIZATION --conv-template chatml -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/

mlc_chat compile ./dist/${MODEL_NAME}-${QUANTIZATION}-MLC/mlc-chat-config.json --device android --system-lib-prefix yi_q4f16_1 -o ./dist/libs/${MODEL_NAME}-${QUANTIZATION}-android.tar

The configuration file ./android/library/src/main/assets/app-config.json is as follows:

{
  "model_list": [
    {
      "model_url": "https://huggingface.co/01-ai/Yi-6B-Chat",
      "model_lib": "yi_q4f16_1",
      "estimated_vram_bytes": 4348727787,
      "model_id": "yi-6b-chat-q4f16_1"
    }
  ],
  "model_lib_path_for_prepare_libs": {
    "yi_q4f16_1": "libs/Yi-6B-Chat-q4f16_1-android.tar"
  }
}
  1. Bundle model library
cd ./android/library
./prepare_libs.sh
  1. Build android app
cd .. && ./gradlew assembleDebug

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA):
  • Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu 22.04.3 LTS
  • How you installed MLC-LLM (conda, source):pip3 install /share/jiepeng.liu/tools/mlc_chat_nightly_cu122-0.1.dev974-cp311-cp311-manylinux_2_28_x86_64.whl -i https://mirrors.cloud.tencent.com/pypi/simple
  • How you installed TVM-Unity (pip, source): pip3 install /share/jiepeng.liu/tools/mlc_ai_nightly_cu122-0.15.dev99-cp311-cp311-manylinux_2_28_x86_64.whl -i https://mirrors.cloud.tencent.com/pypi/simple
  • Python version = 3.11
  • GPU driver version (if applicable): 535.129.03
  • CUDA/cuDNN version (if applicable): 12.2
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU:
CUDA_VERSION: 12.2
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: ON
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --ignore-libllvm --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: ON
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: f06d486b4a1a27f0bbb072688a5fc41e7b15323c
USE_VULKAN: ON
USE_RUST_EXT: OFF
USE_CUTLASS: ON
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-03-08 02:04:22 -0500
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: ON
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: ON
USE_METAL: OFF
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /opt/rh/gcc-toolset-11/root/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON

MrRace avatar Mar 14 '24 08:03 MrRace

Post your mlc_config.json please

Mawriyo avatar Mar 14 '24 22:03 Mawriyo

mlc_config.json

@Mawriyo Thanks for your reply. Here is the content of mlc_config.json::

{
  "model_type": "llama",
  "quantization": "q4f16_1",
  "model_config": {
    "hidden_size": 4096,
    "intermediate_size": 11008,
    "num_attention_heads": 32,
    "num_hidden_layers": 32,
    "rms_norm_eps": 1e-05,
    "vocab_size": 64000,
    "position_embedding_base": 5000000.0,
    "context_window_size": 4096,
    "prefill_chunk_size": 4096,
    "num_key_value_heads": 4,
    "head_dim": 128,
    "tensor_parallel_shards": 1,
    "max_batch_size": 80
  },
  "vocab_size": 64000,
  "context_window_size": 4096,
  "sliding_window_size": -1,
  "prefill_chunk_size": 4096,
  "attention_sink_size": -1,
  "tensor_parallel_shards": 1,
  "mean_gen_len": 128,
  "max_gen_len": 512,
  "shift_fill_factor": 0.3,
  "temperature": 0.6,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "repetition_penalty": 1.0,
  "top_p": 0.8,
  "conv_template": "chatml",
  "pad_token_id": 0,
  "bos_token_id": 6,
  "eos_token_id": 7,
  "tokenizer_files": [
    "tokenizer.model",
    "tokenizer_config.json",
    "tokenizer.json"
  ],
  "version": "0.1.0"
}

MrRace avatar Mar 15 '24 01:03 MrRace

Whats odd is that your mlc-config is using llama? if you compiled this mlc-config from

mlc_chat compile ./dist/${MODEL_NAME}-${QUANTIZATION}-MLC/mlc-chat-config.json --device android --system-lib-prefix yi_q4f16_1 -o ./dist/libs/${MODEL_NAME}-${QUANTIZATION}-android.tar

The issue you have is that model_name and Quantization is still linked to llama. try explicitly filling those out or try exporting those variables for the target model! Hope this helps

Mawriyo avatar Mar 15 '24 13:03 Mawriyo

@MrRace Can you please try replacing y1_q4f16_1 with llama_q4f16_1 for both instances in app-config.json? The model_lib format should be <model_type>_<quantization> (from the mlc-chat-config.txt file.

Kartik14 avatar Mar 18 '24 14:03 Kartik14

@MrRace Just wondering how the issue is going? Do you still have the error after applying what @Kartik14 mentioned? #1993 also introduces something that may be helpful.

MasterJH5574 avatar Mar 25 '24 15:03 MasterJH5574

@MasterJH5574: I'm trying to compile a fine-tuned version of Gemma 2b model to run on Android but unfortunately, my system doesn't support CMake to run the prepare_libs.sh script.

Could you kindly assist by executing the prepare_libs.sh script using the below app-config.json and the Android tar files and provide me the resulting JAR files? I'd really appreciate your help here.

prebuilt_libs.zip app-config.json

NSTiwari avatar Mar 27 '24 11:03 NSTiwari

@MasterJH5574: I'm trying to compile a fine-tuned version of Gemma 2b model to run on Android but unfortunately, my system doesn't support CMake to run the prepare_libs.sh script.

Could you kindly assist by executing the prepare_libs.sh script using the below app-config.json and the Android tar files and provide me the resulting JAR files? I'd really appreciate your help here.

prebuilt_libs.zip app-config.json

@Mawriyo @MasterJH5574: Can you please help?

NSTiwari avatar Mar 28 '24 06:03 NSTiwari

we do depend on cmake to support android, we recommend conda environment to enable such cases

tqchen avatar May 11 '24 02:05 tqchen