mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Bug] Still Experiencing 'Error: Using LLVM 19.1.3 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`'

Open BuildBackBuehler opened this issue 11 months ago • 2 comments

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. Build from source using MLC's TVM-Relax & their MLC-LLM Github Repos. I used a few custom options if that matters.

I believe it was CoreML (On), Use_Metal (On), Use_LLVM (The custom dictated in docs, llvm-config blah blah blah), MSGPACK CXX20 (On), MSG_Use_Boost (ON), SPM_Use_Shared (On), SPM_Use_TMALLOC (On), TVM Debug with ABI (On), TVM_Log_Before_Throw (On), Use_BLAS (apple), Use_BNNS (On), Install_Dev (On), Summary (On), Hide_Symbols (On)

Not sure if there are any TVM-specific ones, just going down the list of my MLC build.

  1. Run literally any MLC_LLM command. I tried to use it in spite of the error, using a model I compiled a few months back for chat. [2024-12-01 07:50:07] INFO auto_device.py:88: Not found device: cuda:0 [2024-12-01 07:50:08] INFO auto_device.py:88: Not found device: rocm:0 [2024-12-01 07:50:09] INFO auto_device.py:79: Found device: metal:0 [2024-12-01 07:50:11] INFO auto_device.py:88: Not found device: vulkan:0 [2024-12-01 07:50:12] INFO auto_device.py:88: Not found device: opencl:0 [2024-12-01 07:50:12] INFO auto_device.py:35: Using device: metal:0 [2024-12-01 07:50:12] INFO engine_base.py:143: Using library model: /Users/zack/.home/local/models/Llama_q3/mlc.dylib [07:50:13] /Users/zack/.home/gitrepos/LLMLife/frontend/mlc-llm/cpp/serve/config.cc:688: Under mode "local", max batch size will be set to 4, max KV cache token capacity will be set to 8192, prefill chunk size will be set to 2048. [07:50:13] /Users/zack/.home/gitrepos/LLMLife/frontend/mlc-llm/cpp/serve/config.cc:688: Under mode "interactive", max batch size will be set to 1, max KV cache token capacity will be set to 32768, prefill chunk size will be set to 2048. [07:50:13] /Users/zack/.home/gitrepos/LLMLife/frontend/mlc-llm/cpp/serve/config.cc:688: Under mode "server", max batch size will be set to 80, max KV cache token capacity will be set to 32768, prefill chunk size will be set to 2048. [07:50:13] /Users/zack/.home/gitrepos/LLMLife/frontend/mlc-llm/cpp/serve/config.cc:769: The actual engine mode is "interactive". So max batch size is 1, max KV cache token capacity is 32768, prefill chunk size is 2048. [07:50:13] /Users/zack/.home/gitrepos/LLMLife/frontend/mlc-llm/cpp/serve/config.cc:774: Estimated total single GPU memory usage: 41979.309 MB (Parameters: 30304.259 MB. KVCache: 10361.055 MB. Temporary buffer: 1313.996 MB). The actual usage might be slightly larger than the estimated number. [07:50:32] /Users/zack/.home/gitrepos/LLMLife/backend/tvm/src/runtime/relax_vm/paged_kv_cache.cc:2666: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args.

Stack trace: Exception in thread Thread-1: Traceback (most recent call last): File "/Users/zack/.home/local/mise/installs/python/3.11.9/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() File "/Users/zack/.home/local/mise/installs/python/3.11.9/lib/python3.11/threading.py", line 982, in run self._target(*self._args, **self._kwargs) File "/Users/zack/.home/gitrepos/LLMLife/backend/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 245, in __call__ raise_last_ffi_error() File "/Users/zack/.home/gitrepos/LLMLife/backend/tvm/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err tvm._ffi.base.TVMError: Traceback (most recent call last): File "/Users/zack/.home/gitrepos/LLMLife/backend/tvm/src/runtime/relax_vm/paged_kv_cache.cc", line 2666 TVMError: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. `

Expected behavior

Well, expected with my limited number of customized build options I wouldn't run into trouble, but considering the other person's bug report was closed I imagine it may be related to that. While I'm here, may as well also ask what the status with Use_MPS is. I didn't use it as it caused problems in the past and it sounded like it was being phased out anyways.

Environment

  • How you installed MLC-LLM/TVM-Unity (conda, source): Source, Poetry

  • Python version (e.g. 3.10): 3.11.9

  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): [07:53:41] /Users/zack/.home/gitrepos/LLMLife/backend/tvm/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.3 with -mcpu=apple-latestis not valid in-mtriple=arm64-apple-macos, using default -mcpu=generic[07:53:41] /Users/zack/.home/gitrepos/LLMLife/backend/tvm/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.3 with-mcpu=apple-latestis not valid in-mtriple=arm64-apple-macos, using default -mcpu=generic[07:53:41] /Users/zack/.home/gitrepos/LLMLife/backend/tvm/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.3 with-mcpu=apple-latestis not valid in-mtriple=arm64-apple-macos, using default -mcpu=generic USE_NVTX: OFF USE_GTEST: AUTO SUMMARIZE: OFF TVM_DEBUG_WITH_ABI_CHANGE: OFF USE_IOS_RPC: OFF USE_MSC: OFF USE_ETHOSU: OFF CUDA_VERSION: NOT-FOUND USE_LIBBACKTRACE: AUTO DLPACK_PATH: 3rdparty/dlpack/include USE_TENSORRT_CODEGEN: OFF USE_OPENCL_EXTN_QCOM: NOT-FOUND USE_TARGET_ONNX: OFF USE_AOT_EXECUTOR: ON BUILD_DUMMY_LIBTVM: OFF USE_CUDNN: OFF USE_TENSORRT_RUNTIME: OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF USE_THRUST: OFF USE_CCACHE: AUTO USE_ARM_COMPUTE_LIB: OFF USE_CPP_RTVM: OFF USE_OPENCL_GTEST: /path/to/opencl/gtest TVM_LOG_BEFORE_THROW: ON USE_MKL: OFF USE_PT_TVMDSOOP: OFF MLIR_VERSION: NOT-FOUND USE_CLML: OFF USE_STACKVM_RUNTIME: OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF ROCM_PATH: /opt/rocm USE_DNNL: OFF USE_MSCCL: OFF USE_NNAPI_RUNTIME: OFF USE_VITIS_AI: OFF USE_MLIR: OFF USE_RCCL: OFF USE_LLVM: llvm-config --ignore-libllvm --link-static USE_VERILATOR: OFF USE_TF_TVMDSOOP: OFF USE_THREADS: ON USE_MSVC_MT: OFF BACKTRACE_ON_SEGFAULT: OFF USE_GRAPH_EXECUTOR: ON USE_NCCL: OFF USE_ROCBLAS: OFF GIT_COMMIT_HASH: e6b2a55d1e1668d889ce69efa3921bc73dcb8b8a USE_VULKAN: OFF USE_RUST_EXT: OFF USE_CUTLASS: OFF USE_CPP_RPC: OFF USE_HEXAGON: OFF USE_CUSTOM_LOGGING: OFF USE_UMA: OFF USE_FALLBACK_STL_MAP: OFF USE_SORT: ON USE_RTTI: ON GIT_COMMIT_TIME: 2024-11-20 23:38:22 -0500 USE_HIPBLAS: OFF USE_HEXAGON_SDK: /path/to/sdk USE_BLAS: none USE_ETHOSN: OFF USE_LIBTORCH: OFF USE_RANDOM: ON USE_CUDA: OFF USE_COREML: ON USE_AMX: OFF BUILD_STATIC_RUNTIME: OFF USE_CMSISNN: OFF USE_KHRONOS_SPIRV: OFF USE_CLML_GRAPH_EXECUTOR: OFF USE_TFLITE: OFF USE_HEXAGON_GTEST: /path/to/hexagon/gtest PICOJSON_PATH: 3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR: OFF INSTALL_DEV: OFF USE_PROFILER: ON USE_NNPACK: OFF LLVM_VERSION: 19.1.3 USE_MRVL: OFF USE_OPENCL: OFF COMPILER_RT_PATH: 3rdparty/compiler-rt USE_NNAPI_CODEGEN: OFF RANG_PATH: 3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF USE_OPENMP: none USE_BNNS: OFF USE_FLASHINFER: OFF USE_CUBLAS: OFF USE_METAL: ON USE_MICRO_STANDALONE_RUNTIME: OFF USE_HEXAGON_EXTERNAL_LIBS: OFF USE_ALTERNATIVE_LINKER: AUTO USE_BYODT_POSIT: OFF USE_NVSHMEM: OFF USE_HEXAGON_RPC: OFF USE_MICRO: OFF DMLC_PATH: 3rdparty/dmlc-core/include INDEX_DEFAULT_I64: ON USE_RELAY_DEBUG: OFF USE_RPC: ON USE_TENSORFLOW_PATH: none TVM_CLML_VERSION: USE_MIOPEN: OFF USE_ROCM: OFF USE_PAPI: OFF USE_CURAND: OFF TVM_CXX_COMPILER_PATH: /opt/homebrew/opt/llvm/bin/clang++ HIDE_PRIVATE_SYMBOLS: OFF

  • Any other relevant information: Well, its odd because it would appear that my options ^, which I dictate with ccmake .. before I do cmake --build && nproc -j10, aren't seeming to be honored. I also forgot I do set "Use_OPENMP" (ON). But donno if that needs to be set to a file path for that one to work.

I compiled this last on Friday and used the latest builds at that time (with git pull --recurse-submodules to boot)

BuildBackBuehler avatar Dec 01 '24 12:12 BuildBackBuehler