mlc-llm
mlc-llm copied to clipboard
TVMError: Cannot run module, architecture mismatch
When I run "python build.py --model ./dist/models/vicuna-7b --quantization q4f16_0 --target android --max-seq-len 768", I got an issue like
"
[18:13:12] /Users/wenkeyu1/Desktop/mlc-llm/tvm-unity/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin22.3.0
Traceback (most recent call last):
File "/Users/wenkeyu1/Desktop/mlc-llm/build.py", line 417, in
[18:13:12] /Users/wenkeyu1/Desktop/mlc-llm/tvm-unity/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin22.3.0
It seems indicative of the underlying issue. The module is of ARM architecture, while the host is x86_64.
Why did this happen? I just follow the instructions from ios/ReadME.md
Does that mean I can't run build.py for ios/android on Mac with intel cpu?
@kywen1119 You can definitely run build.py
no matter it's an Intel or ARM macbook without problem, but I will need more details to help you with your case. Could you shall all the outputs from build.py? More specifically, those lines may be helpful to me: https://github.com/mlc-ai/mlc-llm/blob/476fed9400a2933a97b6ffaf8973a38788b8324f/mlc_llm/utils.py#L185-L187
@kywen1119 You can definitely run
build.py
no matter it's an Intel or ARM macbook without problem, but I will need more details to help you with your case. Could you shall all the outputs from build.py? More specifically, those lines may be helpful to me:https://github.com/mlc-ai/mlc-llm/blob/476fed9400a2933a97b6ffaf8973a38788b8324f/mlc_llm/utils.py#L185-L187
Same problem for me, stdout is
Automatically using target for weight quantization: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32
[13:41:20] /Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin22.4.0
My device is Apple M1 Pro
My host is actually arm_64 with M1 chip,why host=x86_64?
I have same problem
Weights exist at dist/models/dolly-v2-3b, skipping download. Using path "dist/models/dolly-v2-3b" for model "dolly-v2-3b" Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b'] [15:00:03] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=AMD Radeon Pro 5300M [15:00:03] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 1, name=Intel(R) UHD Graphics 630 Target configured: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32 Automatically using target for weight quantization: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32 [15:00:12] /Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin21.6.0 Traceback (most recent call last): File "/Users/dfq/Desktop/projects/git/mlc-llm/build.py", line 417, in <module> main() File "/Users/dfq/Desktop/projects/git/mlc-llm/build.py", line 395, in main mod = mod_transform_before_build(mod, params, ARGS) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dfq/Desktop/projects/git/mlc-llm/build.py", line 278, in mod_transform_before_build new_params = utils.transform_params(mod_transform, model_params, args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dfq/Desktop/projects/git/mlc-llm/mlc_llm/utils.py", line 255, in transform_params vm = relax.vm.VirtualMachine(ex, device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dfq/anaconda3/envs/mlc-llm-env/lib/python3.11/site-packages/tvm/runtime/relax_vm.py", line 96, in __init__ self._setup_device(device, memory_cfg) File "/Users/dfq/anaconda3/envs/mlc-llm-env/lib/python3.11/site-packages/tvm/runtime/relax_vm.py", line 137, in _setup_device self.module["vm_initialization"](*init_args) File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL tvm._ffi.base.TVMError: Traceback (most recent call last): File "/Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc", line 389 TVMError: Cannot run module, architecture mismatch
Automatically using target for weight quantization: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32 [15:00:12] /Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin21.6.0
@hermitgreen, @junrushao
It's because even though you are running on an ARM64 architecture, your installed Anaconda version is x86_64.
Check out the value of platform
after running conda info
. What you would want in this case is osx-arm64
, not osx-64
.
Install the MacOS M1/M2 version of Anaconda, create an ARM64 environment and try again.
I can confirm. I hit this problem and setting the environment fixed it for me.
Initially conda info
showed platform : osx-64
.
I then ran:
CONDA_SUBDIR=osx-arm64 conda create -n mlc-llm-env numpy -c conda-forge
conda activate mlc-llm-env
conda config --env --set subdir osx-arm64
reinstalled the packages from the readme and then python build.py --hf-path=databricks/dolly-v2-3b
worked for me
@kywen1119 You can definitely run
build.py
no matter it's an Intel or ARM macbook without problem, but I will need more details to help you with your case. Could you shall all the outputs from build.py? More specifically, those lines may be helpful to me:https://github.com/mlc-ai/mlc-llm/blob/476fed9400a2933a97b6ffaf8973a38788b8324f/mlc_llm/utils.py#L185-L187
Hi, My whole logs are listed here
/bin/sh: lscpu: command not found Using path "../../mlc-llm/dist/models/vicuna-7b" for model "vicuna-7b" Database paths: ['log_db/rwkv-raven-3b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/vicuna-v1-7b'] Target configured: metal -keys=metal,gpu -libs=iphoneos -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -thread_warp_size=1 [10:41:14] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=AMD Radeon Pro 5300M [10:41:15] /Users/runner/work/package/package/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 1, name=Intel(R) UHD Graphics 630 Automatically using target for weight quantization: metal -keys=metal,gpu -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=1024 -thread_warp_size=32 [10:41:24] /Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin22.3.0 Traceback (most recent call last): File "/Users/wenkeyu1/Desktop/desk/mlc-llm/build.py", line 417, in <module> main() File "/Users/wenkeyu1/Desktop/desk/mlc-llm/build.py", line 395, in main mod = mod_transform_before_build(mod, params, ARGS) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wenkeyu1/Desktop/desk/mlc-llm/build.py", line 278, in mod_transform_before_build new_params = utils.transform_params(mod_transform, model_params, args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wenkeyu1/Desktop/desk/mlc-llm/mlc_llm/utils.py", line 255, in transform_params vm = relax.vm.VirtualMachine(ex, device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wenkeyu1/miniconda3/envs/mlc-llm-env/lib/python3.11/site-packages/tvm/runtime/relax_vm.py", line 96, in __init__ self._setup_device(device, memory_cfg) File "/Users/wenkeyu1/miniconda3/envs/mlc-llm-env/lib/python3.11/site-packages/tvm/runtime/relax_vm.py", line 137, in _setup_device self.module["vm_initialization"](*init_args) File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL tvm._ffi.base.TVMError: Traceback (most recent call last): File "/Users/runner/work/package/package/tvm/src/target/llvm/llvm_module.cc", line 389 TVMError: Cannot run module, architecture mismatch
Target configured: metal -keys=metal,gpu -libs=iphoneos -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -thread_warp_size=1
Just wanted to double check, are you compiling for Android or iOS? The target looks very much like iOS to me
Target configured: metal -keys=metal,gpu -libs=iphoneos -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -thread_warp_size=1
Just wanted to double check, are you compiling for Android or iOS? The target looks very much like iOS to me
for ios.
python build.py --model ../../mlcllm/dist/models/vicuna-7b --quantization q3f16_0 --target iphone --max-seq-len 768
@kywen1119 The reason is that our quantization system currently assumes the ARM CPU is used on macOS: https://github.com/mlc-ai/mlc-llm/blob/8f1386fe5aeb4342e0d7287863a3b7b2a072ed13/mlc_llm/utils.py#L372
This assumption is definitely unnecessarily strong. To support x86 CPUs, we have to support auto detection of local CPU architecture, which, fortunately, is supported in TVM's LLVM binding. I submitted a PR as the fix: https://github.com/mlc-ai/mlc-llm/pull/387
Let me know if it works for you!
@kywen1119 The reason is that our quantization system currently assumes the ARM CPU is used on macOS:
https://github.com/mlc-ai/mlc-llm/blob/8f1386fe5aeb4342e0d7287863a3b7b2a072ed13/mlc_llm/utils.py#L372
This assumption is definitely unnecessarily strong. To support x86 CPUs, we have to support auto detection of local CPU architecture, which, fortunately, is supported in TVM's LLVM binding. I submitted a PR as the fix: #387
Let me know if it works for you!
Thanks for your patient reply! The PR works for me, I successfully converted the model.
Thanks!