mlc-llm TVMError: Cannot run module, architecture mismatch

When I run "python build.py --model ./dist/models/vicuna-7b --quantization q4f16_0 --target android --max-seq-len 768", I got an issue like " [18:13:12] /Users/wenkeyu1/Desktop/mlc-llm/tvm-unity/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin22.3.0 Traceback (most recent call last): File "/Users/wenkeyu1/Desktop/mlc-llm/build.py", line 417, in main() File "/Users/wenkeyu1/Desktop/mlc-llm/build.py", line 395, in main mod = mod_transform_before_build(mod, params, ARGS) File "/Users/wenkeyu1/Desktop/mlc-llm/build.py", line 278, in mod_transform_before_build new_params = utils.transform_params(mod_transform, model_params, args) File "/Users/wenkeyu1/Desktop/mlc-llm/mlc_llm/utils.py", line 254, in transform_params vm = relax.vm.VirtualMachine(ex, device) File "/Users/wenkeyu1/Desktop/mlc-llm/tvm-unity/python/tvm/runtime/relax_vm.py", line 96, in init self._setup_device(device, memory_cfg) File "/Users/wenkeyu1/Desktop/mlc-llm/tvm-unity/python/tvm/runtime/relax_vm.py", line 137, in _setup_device self.module"vm_initialization" File "/Users/wenkeyu1/Desktop/mlc-llm/tvm-unity/python/tvm/_ffi/_ctypes/packed_func.py", line 238, in call raise get_last_ffi_error() tvm._ffi.base.TVMError: Traceback (most recent call last): File "/Users/wenkeyu1/Desktop/mlc-llm/tvm-unity/src/target/llvm/llvm_module.cc", line 389 TVMError: Cannot run module, architecture mismatch " Could you please help me solve the problem,Thanks!

Jun 06 '23 10:06 kywen1119

[18:13:12] /Users/wenkeyu1/Desktop/mlc-llm/tvm-unity/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin22.3.0

It seems indicative of the underlying issue. The module is of ARM architecture, while the host is x86_64.

Jun 06 '23 12:06 junrushao

Why did this happen? I just follow the instructions from ios/ReadME.md

Jun 07 '23 01:06 kywen1119

Does that mean I can't run build.py for ios/android on Mac with intel cpu?

Jun 07 '23 01:06 kywen1119

@kywen1119 You can definitely run build.py no matter it's an Intel or ARM macbook without problem, but I will need more details to help you with your case. Could you shall all the outputs from build.py? More specifically, those lines may be helpful to me: https://github.com/mlc-ai/mlc-llm/blob/476fed9400a2933a97b6ffaf8973a38788b8324f/mlc_llm/utils.py#L185-L187

Jun 07 '23 12:06 junrushao

@kywen1119 You can definitely run build.py no matter it's an Intel or ARM macbook without problem, but I will need more details to help you with your case. Could you shall all the outputs from build.py? More specifically, those lines may be helpful to me:

https://github.com/mlc-ai/mlc-llm/blob/476fed9400a2933a97b6ffaf8973a38788b8324f/mlc_llm/utils.py#L185-L187

Same problem for me, stdout is

Automatically using target for weight quantization: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32
[13:41:20] /Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin22.4.0

My device is Apple M1 Pro

Jun 09 '23 05:06 hermitgreen

My host is actually arm_64 with M1 chip，why host=x86_64？

Jun 09 '23 05:06 hermitgreen

I have same problem Weights exist at dist/models/dolly-v2-3b, skipping download. Using path "dist/models/dolly-v2-3b" for model "dolly-v2-3b" Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b'] [15:00:03] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=AMD Radeon Pro 5300M [15:00:03] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 1, name=Intel(R) UHD Graphics 630 Target configured: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32 Automatically using target for weight quantization: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32 [15:00:12] /Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin21.6.0 Traceback (most recent call last): File "/Users/dfq/Desktop/projects/git/mlc-llm/build.py", line 417, in <module> main() File "/Users/dfq/Desktop/projects/git/mlc-llm/build.py", line 395, in main mod = mod_transform_before_build(mod, params, ARGS) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dfq/Desktop/projects/git/mlc-llm/build.py", line 278, in mod_transform_before_build new_params = utils.transform_params(mod_transform, model_params, args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dfq/Desktop/projects/git/mlc-llm/mlc_llm/utils.py", line 255, in transform_params vm = relax.vm.VirtualMachine(ex, device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dfq/anaconda3/envs/mlc-llm-env/lib/python3.11/site-packages/tvm/runtime/relax_vm.py", line 96, in __init__ self._setup_device(device, memory_cfg) File "/Users/dfq/anaconda3/envs/mlc-llm-env/lib/python3.11/site-packages/tvm/runtime/relax_vm.py", line 137, in _setup_device self.module["vm_initialization"](*init_args) File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL tvm._ffi.base.TVMError: Traceback (most recent call last): File "/Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc", line 389 TVMError: Cannot run module, architecture mismatch

Automatically using target for weight quantization: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32 [15:00:12] /Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin21.6.0

Jun 09 '23 07:06 dfqddd

@hermitgreen, @junrushao It's because even though you are running on an ARM64 architecture, your installed Anaconda version is x86_64. Check out the value of platform after running conda info. What you would want in this case is osx-arm64, not osx-64.

Install the MacOS M1/M2 version of Anaconda, create an ARM64 environment and try again.

Jun 09 '23 07:06 Calin-Mihnea

I can confirm. I hit this problem and setting the environment fixed it for me.

Initially conda info showed platform : osx-64.

I then ran: CONDA_SUBDIR=osx-arm64 conda create -n mlc-llm-env numpy -c conda-forge conda activate mlc-llm-env conda config --env --set subdir osx-arm64

reinstalled the packages from the readme and then python build.py --hf-path=databricks/dolly-v2-3b worked for me

Jun 09 '23 22:06 ldnovak

@kywen1119 You can definitely run build.py no matter it's an Intel or ARM macbook without problem, but I will need more details to help you with your case. Could you shall all the outputs from build.py? More specifically, those lines may be helpful to me:

https://github.com/mlc-ai/mlc-llm/blob/476fed9400a2933a97b6ffaf8973a38788b8324f/mlc_llm/utils.py#L185-L187

Hi, My whole logs are listed here

/bin/sh: lscpu: command not found Using path "../../mlc-llm/dist/models/vicuna-7b" for model "vicuna-7b" Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b'] Target configured: metal -keys=metal,gpu -libs=iphoneos -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -thread_warp_size=1 [10:41:14] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=AMD Radeon Pro 5300M [10:41:15] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 1, name=Intel(R) UHD Graphics 630 Automatically using target for weight quantization: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32 [10:41:24] /Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin22.3.0 Traceback (most recent call last): File "/Users/wenkeyu1/Desktop/desk/mlc-llm/build.py", line 417, in <module> main() File "/Users/wenkeyu1/Desktop/desk/mlc-llm/build.py", line 395, in main mod = mod_transform_before_build(mod, params, ARGS) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wenkeyu1/Desktop/desk/mlc-llm/build.py", line 278, in mod_transform_before_build new_params = utils.transform_params(mod_transform, model_params, args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wenkeyu1/Desktop/desk/mlc-llm/mlc_llm/utils.py", line 255, in transform_params vm = relax.vm.VirtualMachine(ex, device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wenkeyu1/miniconda3/envs/mlc-llm-env/lib/python3.11/site-packages/tvm/runtime/relax_vm.py", line 96, in __init__ self._setup_device(device, memory_cfg) File "/Users/wenkeyu1/miniconda3/envs/mlc-llm-env/lib/python3.11/site-packages/tvm/runtime/relax_vm.py", line 137, in _setup_device self.module["vm_initialization"](*init_args) File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL tvm._ffi.base.TVMError: Traceback (most recent call last): File "/Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc", line 389 TVMError: Cannot run module, architecture mismatch

Jun 12 '23 02:06 kywen1119

Target configured: metal -keys=metal,gpu -libs=iphoneos -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -thread_warp_size=1

Just wanted to double check, are you compiling for Android or iOS? The target looks very much like iOS to me

Jun 12 '23 03:06 junrushao

Target configured: metal -keys=metal,gpu -libs=iphoneos -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -thread_warp_size=1
Just wanted to double check, are you compiling for Android or iOS? The target looks very much like iOS to me

for ios. python build.py --model ../../mlcllm/dist/models/vicuna-7b --quantization q3f16_0 --target iphone --max-seq-len 768

Jun 12 '23 04:06 kywen1119

@kywen1119 The reason is that our quantization system currently assumes the ARM CPU is used on macOS: https://github.com/mlc-ai/mlc-llm/blob/8f1386fe5aeb4342e0d7287863a3b7b2a072ed13/mlc_llm/utils.py#L372

This assumption is definitely unnecessarily strong. To support x86 CPUs, we have to support auto detection of local CPU architecture, which, fortunately, is supported in TVM's LLVM binding. I submitted a PR as the fix: https://github.com/mlc-ai/mlc-llm/pull/387

Let me know if it works for you!

Jun 12 '23 15:06 junrushao

@kywen1119 The reason is that our quantization system currently assumes the ARM CPU is used on macOS:

https://github.com/mlc-ai/mlc-llm/blob/8f1386fe5aeb4342e0d7287863a3b7b2a072ed13/mlc_llm/utils.py#L372

This assumption is definitely unnecessarily strong. To support x86 CPUs, we have to support auto detection of local CPU architecture, which, fortunately, is supported in TVM's LLVM binding. I submitted a PR as the fix: #387

Let me know if it works for you!

Thanks for your patient reply! The PR works for me, I successfully converted the model.

Jun 13 '23 01:06 kywen1119

Thanks!

Jun 14 '23 05:06 junrushao

mlc-llm mlc-llm copied to clipboard

TVMError: Cannot run module, architecture mismatch

mlc-llm
mlc-llm copied to clipboard