mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Failed to detect local GPU

Open chensinit opened this issue 2 years ago • 1 comments

🐛 Bug

Hello. I try to build model but my gpu is not work and i get error.

$ python build.py --hf-path=databricks/dolly-v2-3b --quantization q4f16_0 --target android --max-seq-len 768 Weights exist at dist/models/dolly-v2-3b, skipping download. Using path "dist/models/dolly-v2-3b" for model "dolly-v2-3b" Database paths: ['log_db/vicuna-v1-7b', 'log_db/rwkv-raven-3b', 'log_db/rwkv-raven-1b5', 'log_db/redpajama-3b-q4f16', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/redpajama-3b-q4f32'] Target configured: opencl -keys=opencl,gpu -max_num_threads=256 -max_shared_memory_per_block=16384 -max_threads_per_block=256 -texture_spatial_limit=16384 -thread_warp_size=1 Failed to detect local GPU, falling back to CPU as a target Automatically using target for weight quantization: llvm -keys=cpu Start computing and quantizing weights... This may take a while. Finish computing and quantizing weights. Total param size: 1.4633262157440186 GB Start storing to cache dist/dolly-v2-3b-q4f16_0/params [0710/0710] saving param_709 All finished, 51 total shards committed, record saved to dist/dolly-v2-3b-q4f16_0/params/ndarray-cache.json Save a cached module to dist/dolly-v2-3b-q4f16_0/mod_cache_before_build_android.pkl. Dump static shape TIR to dist/dolly-v2-3b-q4f16_0/debug/mod_tir_static.py Dump dynamic shape TIR to dist/dolly-v2-3b-q4f16_0/debug/mod_tir_dynamic.py

  • Dispatch to pre-scheduled op: fused_NT_matmul2_divide1_maximum1_minimum1_cast7
  • Dispatch to pre-scheduled op: fused_softmax1_cast8
  • Dispatch to pre-scheduled op: layer_norm1
  • Dispatch to pre-scheduled op: matmul8
  • Dispatch to pre-scheduled op: fused_NT_matmul_divide_maximum_minimum_cast2
  • Dispatch to pre-scheduled op: fused_NT_matmul1_add3_add5_add5
  • Dispatch to pre-scheduled op: matmul2
  • Dispatch to pre-scheduled op: fused_softmax_cast3
  • Dispatch to pre-scheduled op: fused_layer_norm1_cast6
  • Dispatch to pre-scheduled op: fused_NT_matmul1_add3_add5_add5_cast5
  • Dispatch to pre-scheduled op: fused_min_max_triu_te_broadcast_to Finish exporting to dist/dolly-v2-3b-q4f16_0/dolly-v2-3b-q4f16_0-android.tar Finish exporting chat config to dist/dolly-v2-3b-q4f16_0/params/mlc-chat-config.json free(): invalid pointer 중지됨 (코어 덤프됨) <--- Stoped (core is dumped)

To Reproduce

Steps to reproduce the behavior:

python build.py --hf-path=databricks/dolly-v2-3b --quantization q4f16_0 --target android --max-seq-len 768

Expected behavior

build model is success.

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): cuda

  • Operating system (e.g. Ubuntu/Windows/MacOS/...): ubuntu

  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) GTX 1080 TI

  • How you installed MLC-LLM (conda, source): source

  • How you installed TVM-Unity (pip, source): source

  • Python version (e.g. 3.10): 3.10.9

  • GPU driver version (if applicable): latest ( Iinstalled 1 month ago.)

  • CUDA/cuDNN version (if applicable): release 12.1, V12.1.105

  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): USE_GTEST: AUTO SUMMARIZE: OFF USE_IOS_RPC: OFF CUDA_VERSION: NOT-FOUND USE_LIBBACKTRACE: AUTO DLPACK_PATH: 3rdparty/dlpack/include USE_TENSORRT_CODEGEN: OFF USE_THRUST: OFF USE_TARGET_ONNX: OFF USE_AOT_EXECUTOR: ON USE_CUDNN: OFF USE_TENSORRT_RUNTIME: OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF USE_CCACHE: AUTO USE_ARM_COMPUTE_LIB: OFF USE_CPP_RTVM: OFF USE_OPENCL_GTEST: /path/to/opencl/gtest USE_MKL: OFF USE_PT_TVMDSOOP: OFF USE_CLML: OFF USE_STACKVM_RUNTIME: OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF ROCM_PATH: /opt/rocm USE_DNNL: OFF USE_VITIS_AI: OFF USE_LLVM: ON USE_VERILATOR: OFF USE_TF_TVMDSOOP: OFF USE_THREADS: ON USE_MSVC_MT: OFF BACKTRACE_ON_SEGFAULT: OFF USE_GRAPH_EXECUTOR: ON USE_ROCBLAS: OFF GIT_COMMIT_HASH: 3c6e82fb3bb6510c676aad807c79a8e519f57f5a USE_VULKAN: OFF USE_RUST_EXT: OFF USE_CUTLASS: OFF USE_CPP_RPC: OFF USE_HEXAGON: OFF USE_CUSTOM_LOGGING: OFF USE_UMA: OFF USE_FALLBACK_STL_MAP: OFF USE_SORT: ON USE_RTTI: ON GIT_COMMIT_TIME: 2023-05-29 21:35:11 -0400 USE_HEXAGON_SDK: /path/to/sdk USE_BLAS: none USE_ETHOSN: OFF USE_LIBTORCH: OFF USE_RANDOM: ON USE_CUDA: OFF USE_COREML: OFF USE_AMX: OFF BUILD_STATIC_RUNTIME: OFF USE_CMSISNN: OFF USE_KHRONOS_SPIRV: OFF USE_CLML_GRAPH_EXECUTOR: OFF USE_TFLITE: OFF USE_HEXAGON_GTEST: /path/to/hexagon/gtest PICOJSON_PATH: 3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR: OFF INSTALL_DEV: OFF USE_PROFILER: ON USE_NNPACK: OFF LLVM_VERSION: 14.0.0 USE_OPENCL: ON COMPILER_RT_PATH: 3rdparty/compiler-rt RANG_PATH: 3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF USE_OPENMP: none USE_BNNS: OFF USE_CUBLAS: OFF USE_METAL: OFF USE_MICRO_STANDALONE_RUNTIME: OFF USE_HEXAGON_EXTERNAL_LIBS: OFF USE_ALTERNATIVE_LINKER: AUTO USE_BYODT_POSIT: OFF USE_HEXAGON_RPC: OFF USE_MICRO: OFF DMLC_PATH: 3rdparty/dmlc-core/include INDEX_DEFAULT_I64: ON USE_RELAY_DEBUG: OFF USE_RPC: ON USE_TENSORFLOW_PATH: none TVM_CLML_VERSION: USE_MIOPEN: OFF USE_ROCM: OFF USE_PAPI: OFF USE_CURAND: OFF TVM_CXX_COMPILER_PATH: /usr/bin/c++ HIDE_PRIVATE_SYMBOLS: OFF

  • Any other relevant information:

Additional context

chensinit avatar Jun 05 '23 08:06 chensinit

free(): invalid pointer
중지됨 (코어 덤프됨) <--- Stoped (core is dumped)

This error is caused by symbol conflicts between TVM and PyTorch at program exit time, which you may safely ignore. The build itself should work according to the logs you shared

junrushao avatar Jun 05 '23 14:06 junrushao

Thank you!

chensinit avatar Jun 07 '23 03:06 chensinit