mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Bug] gemma-2b for Android. OpenCL Error Code=-54: CL_INVALID_WORK_GROUP_SIZE

Open qc903113684 opened this issue 1 year ago • 7 comments

🐛 Bug

Compile Gemma-2b for Android in q4f16_0. Load model successful, but chat get error: OpenCL Error Code=-54: CL_INVALID_WORK_GROUP_SIZE Stack trace: File "/home/chaoqin/mlcllm/3rdpaty/tvm/scr/runtime/opencl/opencl_module.cc", line 90

To Reproduce

Steps to reproduce the behavior:

  1. compile gemma-2b by q4f16_0 target android
  2. compile android jar
  3. build app by Android studio

Expected behavior

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA):Android
  • Operating system (e.g. Ubuntu/Windows/MacOS/...):Ubuntu
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): Android Qualcomm Snapdragon 865
  • How you installed MLC-LLM (conda, source): conda
  • How you installed TVM-Unity (pip, source): pip
  • Python version (e.g. 3.10): 3.10
  • GPU driver version (if applicable): 535.86.05
  • CUDA/cuDNN version (if applicable): CUDA 11.8
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): USE_NVTX: OFF USE_GTEST: AUTO SUMMARIZE: OFF USE_IOS_RPC: OFF USE_MSC: OFF USE_ETHOSU: CUDA_VERSION: NOT-FOUND USE_LIBBACKTRACE: AUTO DLPACK_PATH: 3rdparty/dlpack/include USE_TENSORRT_CODEGEN: OFF USE_THRUST: OFF USE_TARGET_ONNX: OFF USE_AOT_EXECUTOR: ON BUILD_DUMMY_LIBTVM: OFF USE_CUDNN: OFF USE_TENSORRT_RUNTIME: OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF USE_CCACHE: AUTO USE_ARM_COMPUTE_LIB: OFF USE_CPP_RTVM: USE_OPENCL_GTEST: /path/to/opencl/gtest USE_MKL: OFF USE_PT_TVMDSOOP: OFF MLIR_VERSION: NOT-FOUND USE_CLML: OFF USE_STACKVM_RUNTIME: OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF ROCM_PATH: /opt/rocm USE_DNNL: OFF USE_VITIS_AI: OFF USE_MLIR: OFF USE_RCCL: OFF USE_LLVM: llvm-config --ignore-libllvm --link-static USE_VERILATOR: OFF USE_TF_TVMDSOOP: OFF USE_THREADS: ON USE_MSVC_MT: OFF BACKTRACE_ON_SEGFAULT: OFF USE_GRAPH_EXECUTOR: ON USE_NCCL: OFF USE_ROCBLAS: OFF GIT_COMMIT_HASH: 79991133c17bb8685185e1f03cc2f688ea37c974 USE_VULKAN: ON USE_RUST_EXT: OFF USE_CUTLASS: OFF USE_CPP_RPC: OFF USE_HEXAGON: OFF USE_CUSTOM_LOGGING: OFF USE_UMA: OFF USE_FALLBACK_STL_MAP: OFF USE_SORT: ON USE_RTTI: ON GIT_COMMIT_TIME: 2024-02-21 22:31:30 -0500 USE_HEXAGON_SDK: /path/to/sdk USE_BLAS: none USE_ETHOSN: OFF USE_LIBTORCH: OFF USE_RANDOM: ON USE_CUDA: OFF USE_COREML: OFF USE_AMX: OFF BUILD_STATIC_RUNTIME: OFF USE_CMSISNN: OFF USE_KHRONOS_SPIRV: OFF USE_CLML_GRAPH_EXECUTOR: OFF USE_TFLITE: OFF USE_HEXAGON_GTEST: /path/to/hexagon/gtest PICOJSON_PATH: 3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR: OFF INSTALL_DEV: OFF USE_PROFILER: ON USE_NNPACK: OFF LLVM_VERSION: 15.0.7 USE_MRVL: OFF USE_OPENCL: OFF COMPILER_RT_PATH: 3rdparty/compiler-rt RANG_PATH: 3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF USE_OPENMP: OFF USE_BNNS: OFF USE_CUBLAS: OFF USE_METAL: OFF USE_MICRO_STANDALONE_RUNTIME: OFF USE_HEXAGON_EXTERNAL_LIBS: OFF USE_ALTERNATIVE_LINKER: AUTO USE_BYODT_POSIT: OFF USE_HEXAGON_RPC: OFF USE_MICRO: OFF DMLC_PATH: 3rdparty/dmlc-core/include INDEX_DEFAULT_I64: ON USE_RELAY_DEBUG: OFF USE_RPC: ON USE_TENSORFLOW_PATH: none TVM_CLML_VERSION: USE_MIOPEN: OFF USE_ROCM: OFF USE_PAPI: OFF USE_CURAND: OFF TVM_CXX_COMPILER_PATH: /opt/rh/gcc-toolset-11/root/usr/bin/c++ HIDE_PRIVATE_SYMBOLS: ON
  • Any other relevant information:

Additional context

  1. Gemma-2b with same code and enviroment on Qualcomm 8gen2 can work successful, but snapdragon 865 chat failed.
  2. Compiled qwen-1.8b for snadragon 865 work successful. I think this error relative to gemma's implementation.

qc903113684 avatar Feb 27 '24 08:02 qc903113684

🐛 Bug

Compile Gemma-2b for Android in q4f16_0. Load model successful, but chat get error: OpenCL Error Code=-54: CL_INVALID_WORK_GROUP_SIZE Stack trace: File "/home/chaoqin/mlcllm/3rdpaty/tvm/scr/runtime/opencl/opencl_module.cc", line 90

To Reproduce

Steps to reproduce the behavior:

  1. compile gemma-2b by q4f16_0 target android
  2. compile android jar
  3. build app by Android studio

Expected behavior

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA):Android
  • Operating system (e.g. Ubuntu/Windows/MacOS/...):Ubuntu
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): Android Qualcomm Snapdragon 865
  • How you installed MLC-LLM (conda, source): conda
  • How you installed TVM-Unity (pip, source): pip
  • Python version (e.g. 3.10): 3.10
  • GPU driver version (if applicable): 535.86.05
  • CUDA/cuDNN version (if applicable): CUDA 11.8
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): USE_NVTX: OFF USE_GTEST: AUTO SUMMARIZE: OFF USE_IOS_RPC: OFF USE_MSC: OFF USE_ETHOSU: CUDA_VERSION: NOT-FOUND USE_LIBBACKTRACE: AUTO DLPACK_PATH: 3rdparty/dlpack/include USE_TENSORRT_CODEGEN: OFF USE_THRUST: OFF USE_TARGET_ONNX: OFF USE_AOT_EXECUTOR: ON BUILD_DUMMY_LIBTVM: OFF USE_CUDNN: OFF USE_TENSORRT_RUNTIME: OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF USE_CCACHE: AUTO USE_ARM_COMPUTE_LIB: OFF USE_CPP_RTVM: USE_OPENCL_GTEST: /path/to/opencl/gtest USE_MKL: OFF USE_PT_TVMDSOOP: OFF MLIR_VERSION: NOT-FOUND USE_CLML: OFF USE_STACKVM_RUNTIME: OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF ROCM_PATH: /opt/rocm USE_DNNL: OFF USE_VITIS_AI: OFF USE_MLIR: OFF USE_RCCL: OFF USE_LLVM: llvm-config --ignore-libllvm --link-static USE_VERILATOR: OFF USE_TF_TVMDSOOP: OFF USE_THREADS: ON USE_MSVC_MT: OFF BACKTRACE_ON_SEGFAULT: OFF USE_GRAPH_EXECUTOR: ON USE_NCCL: OFF USE_ROCBLAS: OFF GIT_COMMIT_HASH: 79991133c17bb8685185e1f03cc2f688ea37c974 USE_VULKAN: ON USE_RUST_EXT: OFF USE_CUTLASS: OFF USE_CPP_RPC: OFF USE_HEXAGON: OFF USE_CUSTOM_LOGGING: OFF USE_UMA: OFF USE_FALLBACK_STL_MAP: OFF USE_SORT: ON USE_RTTI: ON GIT_COMMIT_TIME: 2024-02-21 22:31:30 -0500 USE_HEXAGON_SDK: /path/to/sdk USE_BLAS: none USE_ETHOSN: OFF USE_LIBTORCH: OFF USE_RANDOM: ON USE_CUDA: OFF USE_COREML: OFF USE_AMX: OFF BUILD_STATIC_RUNTIME: OFF USE_CMSISNN: OFF USE_KHRONOS_SPIRV: OFF USE_CLML_GRAPH_EXECUTOR: OFF USE_TFLITE: OFF USE_HEXAGON_GTEST: /path/to/hexagon/gtest PICOJSON_PATH: 3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR: OFF INSTALL_DEV: OFF USE_PROFILER: ON USE_NNPACK: OFF LLVM_VERSION: 15.0.7 USE_MRVL: OFF USE_OPENCL: OFF COMPILER_RT_PATH: 3rdparty/compiler-rt RANG_PATH: 3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF USE_OPENMP: OFF USE_BNNS: OFF USE_CUBLAS: OFF USE_METAL: OFF USE_MICRO_STANDALONE_RUNTIME: OFF USE_HEXAGON_EXTERNAL_LIBS: OFF USE_ALTERNATIVE_LINKER: AUTO USE_BYODT_POSIT: OFF USE_HEXAGON_RPC: OFF USE_MICRO: OFF DMLC_PATH: 3rdparty/dmlc-core/include INDEX_DEFAULT_I64: ON USE_RELAY_DEBUG: OFF USE_RPC: ON USE_TENSORFLOW_PATH: none TVM_CLML_VERSION: USE_MIOPEN: OFF USE_ROCM: OFF USE_PAPI: OFF USE_CURAND: OFF TVM_CXX_COMPILER_PATH: /opt/rh/gcc-toolset-11/root/usr/bin/c++ HIDE_PRIVATE_SYMBOLS: ON
  • Any other relevant information:

Additional context

  1. Gemma-2b with same code and enviroment on Qualcomm 8gen2 can work successful, but snapdragon 865 chat failed.
  2. Compiled qwen-1.8b for snadragon 865 work successful. I think this error relative to gemma's implementation.

I am having this issue as well, but with all the 7B models. it cannot possibly be a memory issue, as 12GB should be more then enough RAM for any of these models (and it not an allocation or out of range error) so I suspect it might be some form of matrix multiplication issue from whatever lib is being used (so OpenCL) where a value being returned is above its maximum allocated range of work items (basically values). I haven't looked at opencl_module.cc yet, but my suspicion is that some dynamic allocation stuff is happening thats messing with a function call.

I cannot think of why this would be happening, but I might pull and see what I could do about it. For now my recommendation would be to try different models and see if any of them work for you, as I have found that all other models other then the 7B ones work for me. It might be different on your end. Hopefully this gets patched.

bulutthecat avatar Mar 16 '24 01:03 bulutthecat

Hi @bulutthecat @qc903113684 apologies for the inconvenience. Could you check whether https://github.com/mlc-ai/mlc-llm/pull/1955 was included when you ran into this issue? Or perhaps try again with the latest package? I suspect that this is fixed via https://github.com/mlc-ai/mlc-llm/pull/1955. Thank you!

CharlieFRuan avatar Mar 18 '24 02:03 CharlieFRuan

@qc903113684 Unfortunately, I am unable to reproduce it on my end. Can you please build tvm and mlc again after fetching the latest changes and then recompile the model library?

Kartik14 avatar Mar 18 '24 06:03 Kartik14

Hi @bulutthecat @qc903113684 apologies for the inconvenience. Could you check whether #1955 was included when you ran into this issue? Or perhaps try again with the latest package? I suspect that this is fixed via #1955. Thank you!

Thanks for letting me know, I will get back to you if it works.

bulutthecat avatar Mar 18 '24 12:03 bulutthecat

this PR may fixed problem, I have no time to test yet. https://github.com/mlc-ai/mlc-llm/pull/1850

qc903113684 avatar Mar 19 '24 07:03 qc903113684

Hi @qc903113684, #1850 is superseded by #1822, which was merged 3 weeks ago.

i.e. #1822 and #1955 can both be potential fix to the problem described in this issue

CharlieFRuan avatar Mar 19 '24 15:03 CharlieFRuan

got same error

MLCChat failed

Stack trace: org.apache.tvm.Base$TVMError: InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE Stack trace: File "/Users/kartik/mlc/tvm/src/runtime/opencl/opencl_module.cc", line 90

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.prefill(ChatModule.java:54)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$1.invoke(AppViewModel.kt:666)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$1.invoke(AppViewModel.kt:666)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:548)
at ai.mlc.mlcchat.AppViewModel$ChatState.requestGenerate$lambda$4(AppViewModel.kt:666)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$lluIrcsPALEW5nCb2tohZYadhTY(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:6)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:462)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:919)

Error message: InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE Stack trace: File "/Users/kartik/mlc/tvm/src/runtime/opencl/opencl_module.cc", line 90

sinaSPOGames avatar Mar 30 '24 04:03 sinaSPOGames