Ascend NPU 310P3 chip run error
I try to run whisper.cpp with CANN in Ascend NPU 310P3, my cann version is 8.0
I follow this cmd to compile: mkdir build cd build cmake .. -D GGML_CANN=on make -j
and infer command: ./build/bin/main -f samples/jfk.wav -m models/ggml-base.en.bin
but the program failed to run, error message is here:
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin' whisper_init_with_params_no_state: use gpu = 1 whisper_init_with_params_no_state: flash attn = 0 whisper_init_with_params_no_state: gpu_device = 0 whisper_init_with_params_no_state: dtw = 0 whisper_model_load: loading model whisper_model_load: n_vocab = 51864 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 512 whisper_model_load: n_audio_head = 8 whisper_model_load: n_audio_layer = 6 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 512 whisper_model_load: n_text_head = 8 whisper_model_load: n_text_layer = 6 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 1 whisper_model_load: qntvr = 0 whisper_model_load: type = 2 (base) whisper_model_load: adding 1607 extra tokens whisper_model_load: n_langs = 99 whisper_model_load: CPU total size = 147.37 MB whisper_model_load: model size = 147.37 MB whisper_backend_init_gpu: using CANN backend whisper_init_state: kv self size = 18.87 MB whisper_init_state: kv cross size = 18.87 MB whisper_init_state: kv pad size = 3.15 MB ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1) ggml_gallocr_reserve_n: reallocating CANN buffer from size 0.00 MiB to 14.48 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB whisper_init_state: compute buffer (conv) = 16.75 MB ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1) ggml_gallocr_reserve_n: reallocating CANN buffer from size 0.00 MiB to 124.33 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB whisper_init_state: compute buffer (encode) = 131.94 MB ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1) ggml_gallocr_reserve_n: reallocating CANN buffer from size 0.00 MiB to 3.43 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB whisper_init_state: compute buffer (cross) = 5.17 MB ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1) ggml_gallocr_reserve_n: reallocating CANN buffer from size 0.00 MiB to 140.16 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 4.38 MiB whisper_init_state: compute buffer (decode) = 153.13 MB
system_info: n_threads = 1 / 96 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 1
main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 1 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
CANN error: EZ9999: Inner Error! EZ9999: 2024-08-19-14:22:06.035.091 The error from device(6), serial number is 13, there is an aivec error, core id is 0, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1f55953f, mte error info: 0x91, ifu error info: 0x37b7ddcfefe80, ccu error info: 0x6b8e2406002b3b8e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] TraceBack (most recent call last): The error from device(6), serial number is 13, there is an aivec error, core id is 1, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x16c7fa7b, mte error info: 0x91, ifu error info: 0xf597fa430f00, ccu error info: 0x6b8e2406000fcf8e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The error from device(6), serial number is 13, there is an aivec error, core id is 2, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1d0fc8f1, mte error info: 0x91, ifu error info: 0x60ebe2bbe800, ccu error info: 0x6b8e2406004cd98e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The error from device(6), serial number is 13, there is an aivec error, core id is 3, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1f1ef1eb, mte error info: 0x91, ifu error info: 0x2df5cebffff80, ccu error info: 0x6b8e2406004bbb8e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The error from device(6), serial number is 13, there is an aivec error, core id is 4, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1be2db77, mte error info: 0x91, ifu error info: 0x38754f9295d80, ccu error info: 0x6b8e2406007e538e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The error from device(6), serial number is 13, there is an aivec error, core id is 5, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1ffddffb, mte error info: 0x91, ifu error info: 0x33ab3f3ffee80, ccu error info: 0x6b8e2406005cf78e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The error from device(6), serial number is 13, there is an aivec error, core id is 6, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1bfe75e7, mte error info: 0x91, ifu error info: 0x35e0b72cebd80, ccu error info: 0x6b8e24060018e08e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The device(6), core list[0-6], error code is:[FUNC:PrintCoreInfoErrMsg][FILE:device_error_proc.cc][LINE:586] coreId( 0): 0x10 0x10 0x10 0x10 [FUNC:PrintCoreInfoErrMsg][FILE:device_error_proc.cc][LINE:586] coreId( 4): 0x10 0x10 0x10 [FUNC:PrintCoreInfoErrMsg][FILE:device_error_proc.cc][LINE:600] Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinic_kernel_task.cc][LINE:1220] Aicore kernel execute failed, device_id=0, stream_id=17, report_stream_id=17, task_id=86, flip_num=0, fault kernel_name=ascendc_dup_by_rows_fp32_to_fp16_3, fault kernel info ext=none, program id=0, hash=6170300059213965033.[FUNC:GetError][FILE:stream.cc][LINE:1082] [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1082] rtStreamSynchronize execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
DEVICE[0] PID[272965]: EXCEPTION STREAM: Exception info:TGID=563630, model id=65535, stream id=17, stream phase=SCHEDULE Message info[0]:RTS_HWTS: Vector core exception, slot_id=3, stream_id=17 Other info[0]:time=2024-08-19-14:21:56.320.060, function=process_hwts_error_exception, line=1320, error code=0x31 current device: 0, in function ggml_backend_cann_synchronize at /home/code/whisper.cpp.src/ggml/src/ggml-cann.cpp:1591 aclrtSynchronizeStream(cann_ctx->stream()) /home/code/whisper.cpp.src/ggml/src/ggml-cann.cpp:123: CANN error Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf warning: process 272965 is already traced by process 272863 ptrace: Operation not permitted. No stack. The program is not being run.
I am a beginner, could you please advise on how to fix this issue? @hipudding @MengqingCao
@lq0104 Try to set SOC_TYPE according to your NPU and check if this work
https://github.com/ggerganov/whisper.cpp/blob/master/ggml/src/ggml-cann/kernels/CMakeLists.txt#L2
OK, I will try it, thank you
when I set SOC_TYPE = Ascend310P3, I encountered an issue while compiling:
/usr/local/Ascend/ascend-toolkit/latest/tools/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:805:5: warning: 'DataCopyPadUB2GMImpl
' is deprecated: NOTICE: DataCopyPad is not deprecated. Currently, DataCopyPad is an unsupported API on Ascend310p or Ascend610. Please check your code! [-Wdeprecated-declarations] DataCopyPadUB2GMImpl((gm T*)dstGlobal.GetPhyAddr(), (ubuf T*)srcLocal.GetPhyAddr(), dataCopyParams); ^ /home/code/whisper.cpp.src/ggml/src/ggml-cann/kernels/dup.cpp:70:9: note: in instantiation of function template specialization 'AscendC::DataCopyPad ' requested here DataCopyPad(dst_gm, dst_local, dataCopyParams); ^ /home/code/whisper.cpp.src/ggml/src/ggml-cann/kernels/dup.cpp:88:9: note: in instantiation of member function 'DupByRows<float, float>::copy_out' requested here copy_out(); ^ /home/code/whisper.cpp.src/ggml/src/ggml-cann/kernels/dup.cpp:175:8: note: in instantiation of member function 'DupByRows<float, float>::dup' requested here op.dup(); ^ /usr/local/Ascend/ascend-toolkit/latest/tools/tikcpp/tikcfw/impl/dav_m200/kernel_operator_data_copy_impl.h:1244:3: note: 'DataCopyPadUB2GMImpl ' has been explicitly marked deprecated here [[deprecated("NOTICE: DataCopyPad is not deprecated. Currently, DataCopyPad is an unsupported API on Ascend310p "
It seems that Ascend310P currently does not support this function'DataCopyPad '
@lq0104 I think some function defination is not same between different SOCs. Someone also met this when using 910A. But you can find some function to replace DataCopyPad.