whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Ascend NPU 310P3 chip run error

Open lq0104 opened this issue 1 year ago • 4 comments

I try to run whisper.cpp with CANN in Ascend NPU 310P3, my cann version is 8.0

I follow this cmd to compile: mkdir build cd build cmake .. -D GGML_CANN=on make -j

and infer command: ./build/bin/main -f samples/jfk.wav -m models/ggml-base.en.bin

but the program failed to run, error message is here:

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin' whisper_init_with_params_no_state: use gpu = 1 whisper_init_with_params_no_state: flash attn = 0 whisper_init_with_params_no_state: gpu_device = 0 whisper_init_with_params_no_state: dtw = 0 whisper_model_load: loading model whisper_model_load: n_vocab = 51864 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 512 whisper_model_load: n_audio_head = 8 whisper_model_load: n_audio_layer = 6 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 512 whisper_model_load: n_text_head = 8 whisper_model_load: n_text_layer = 6 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 1 whisper_model_load: qntvr = 0 whisper_model_load: type = 2 (base) whisper_model_load: adding 1607 extra tokens whisper_model_load: n_langs = 99 whisper_model_load: CPU total size = 147.37 MB whisper_model_load: model size = 147.37 MB whisper_backend_init_gpu: using CANN backend whisper_init_state: kv self size = 18.87 MB whisper_init_state: kv cross size = 18.87 MB whisper_init_state: kv pad size = 3.15 MB ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1) ggml_gallocr_reserve_n: reallocating CANN buffer from size 0.00 MiB to 14.48 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB whisper_init_state: compute buffer (conv) = 16.75 MB ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1) ggml_gallocr_reserve_n: reallocating CANN buffer from size 0.00 MiB to 124.33 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB whisper_init_state: compute buffer (encode) = 131.94 MB ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1) ggml_gallocr_reserve_n: reallocating CANN buffer from size 0.00 MiB to 3.43 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB whisper_init_state: compute buffer (cross) = 5.17 MB ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1) ggml_gallocr_reserve_n: reallocating CANN buffer from size 0.00 MiB to 140.16 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 4.38 MiB whisper_init_state: compute buffer (decode) = 153.13 MB

system_info: n_threads = 1 / 96 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 1

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 1 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

CANN error: EZ9999: Inner Error! EZ9999: 2024-08-19-14:22:06.035.091 The error from device(6), serial number is 13, there is an aivec error, core id is 0, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1f55953f, mte error info: 0x91, ifu error info: 0x37b7ddcfefe80, ccu error info: 0x6b8e2406002b3b8e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] TraceBack (most recent call last): The error from device(6), serial number is 13, there is an aivec error, core id is 1, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x16c7fa7b, mte error info: 0x91, ifu error info: 0xf597fa430f00, ccu error info: 0x6b8e2406000fcf8e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The error from device(6), serial number is 13, there is an aivec error, core id is 2, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1d0fc8f1, mte error info: 0x91, ifu error info: 0x60ebe2bbe800, ccu error info: 0x6b8e2406004cd98e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The error from device(6), serial number is 13, there is an aivec error, core id is 3, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1f1ef1eb, mte error info: 0x91, ifu error info: 0x2df5cebffff80, ccu error info: 0x6b8e2406004bbb8e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The error from device(6), serial number is 13, there is an aivec error, core id is 4, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1be2db77, mte error info: 0x91, ifu error info: 0x38754f9295d80, ccu error info: 0x6b8e2406007e538e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The error from device(6), serial number is 13, there is an aivec error, core id is 5, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1ffddffb, mte error info: 0x91, ifu error info: 0x33ab3f3ffee80, ccu error info: 0x6b8e2406005cf78e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The error from device(6), serial number is 13, there is an aivec error, core id is 6, error code = 0x10, dump info: pc start: 0x8001240005eb108, current: 0x1240005eb650, vec error info: 0x1bfe75e7, mte error info: 0x91, ifu error info: 0x35e0b72cebd80, ccu error info: 0x6b8e24060018e08e, cube error info: 0, biu error info: 0, aic error mask: 0x6de01200c0122c8, para base: 0x12c0c0340000, errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:532] The device(6), core list[0-6], error code is:[FUNC:PrintCoreInfoErrMsg][FILE:device_error_proc.cc][LINE:586] coreId( 0): 0x10 0x10 0x10 0x10 [FUNC:PrintCoreInfoErrMsg][FILE:device_error_proc.cc][LINE:586] coreId( 4): 0x10 0x10 0x10 [FUNC:PrintCoreInfoErrMsg][FILE:device_error_proc.cc][LINE:600] Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinic_kernel_task.cc][LINE:1220] Aicore kernel execute failed, device_id=0, stream_id=17, report_stream_id=17, task_id=86, flip_num=0, fault kernel_name=ascendc_dup_by_rows_fp32_to_fp16_3, fault kernel info ext=none, program id=0, hash=6170300059213965033.[FUNC:GetError][FILE:stream.cc][LINE:1082] [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1082] rtStreamSynchronize execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

DEVICE[0] PID[272965]: EXCEPTION STREAM: Exception info:TGID=563630, model id=65535, stream id=17, stream phase=SCHEDULE Message info[0]:RTS_HWTS: Vector core exception, slot_id=3, stream_id=17 Other info[0]:time=2024-08-19-14:21:56.320.060, function=process_hwts_error_exception, line=1320, error code=0x31 current device: 0, in function ggml_backend_cann_synchronize at /home/code/whisper.cpp.src/ggml/src/ggml-cann.cpp:1591 aclrtSynchronizeStream(cann_ctx->stream()) /home/code/whisper.cpp.src/ggml/src/ggml-cann.cpp:123: CANN error Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf warning: process 272965 is already traced by process 272863 ptrace: Operation not permitted. No stack. The program is not being run.

I am a beginner, could you please advise on how to fix this issue? @hipudding @MengqingCao

lq0104 avatar Aug 22 '24 03:08 lq0104

@lq0104 Try to set SOC_TYPE according to your NPU and check if this work https://github.com/ggerganov/whisper.cpp/blob/master/ggml/src/ggml-cann/kernels/CMakeLists.txt#L2

MengqingCao avatar Aug 22 '24 03:08 MengqingCao

OK, I will try it, thank you

lq0104 avatar Aug 22 '24 06:08 lq0104

when I set SOC_TYPE = Ascend310P3, I encountered an issue while compiling:

/usr/local/Ascend/ascend-toolkit/latest/tools/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:805:5: warning: 'DataCopyPadUB2GMImpl' is deprecated: NOTICE: DataCopyPad is not deprecated. Currently, DataCopyPad is an unsupported API on Ascend310p or Ascend610. Please check your code! [-Wdeprecated-declarations] DataCopyPadUB2GMImpl((gm T*)dstGlobal.GetPhyAddr(), (ubuf T*)srcLocal.GetPhyAddr(), dataCopyParams); ^ /home/code/whisper.cpp.src/ggml/src/ggml-cann/kernels/dup.cpp:70:9: note: in instantiation of function template specialization 'AscendC::DataCopyPad' requested here DataCopyPad(dst_gm, dst_local, dataCopyParams); ^ /home/code/whisper.cpp.src/ggml/src/ggml-cann/kernels/dup.cpp:88:9: note: in instantiation of member function 'DupByRows<float, float>::copy_out' requested here copy_out(); ^ /home/code/whisper.cpp.src/ggml/src/ggml-cann/kernels/dup.cpp:175:8: note: in instantiation of member function 'DupByRows<float, float>::dup' requested here op.dup(); ^ /usr/local/Ascend/ascend-toolkit/latest/tools/tikcpp/tikcfw/impl/dav_m200/kernel_operator_data_copy_impl.h:1244:3: note: 'DataCopyPadUB2GMImpl' has been explicitly marked deprecated here [[deprecated("NOTICE: DataCopyPad is not deprecated. Currently, DataCopyPad is an unsupported API on Ascend310p "

It seems that Ascend310P currently does not support this function'DataCopyPad '

lq0104 avatar Aug 23 '24 01:08 lq0104

@lq0104 I think some function defination is not same between different SOCs. Someone also met this when using 910A. But you can find some function to replace DataCopyPad.

hipudding avatar Aug 26 '24 09:08 hipudding