onnxruntime
onnxruntime copied to clipboard
A bug occurs when the program terminates
Describe the issue
It works well when it run in GPU,but it has a bug when it terminates
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what(): /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 4: driver shutting down ; GPU=806358777 ; hostname=lv-voice-rt-02 ; expr=cudaEventSynchronize(e);
To reproduce
auto env = std::make_shared<Ort::Env>(ORT_LOGGING_LEVEL_WARNING, "RNNT-model");
auto session_options = std::make_shared< Ort::SessionOptions>();
session_options->SetInterOpNumThreads(1);
session_options->SetIntraOpNumThreads(1);
session_options->DisableCpuMemArena();
session_options->SetGraphOptimizationLevel(ORT_ENABLE_ALL);
auto options = std::make_shared<OrtCUDAProviderOptions>();
options->device_id = 0;
options->arena_extend_strategy = 1;
options->cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::OrtCudnnConvAlgoSearchDefault;
options->do_copy_in_default_stream = -1;
options->default_memory_arena_cfg = nullptr;
session_options->AppendExecutionProvider_CUDA(*options);
Urgency
No response
Platform
Linux
OS Version
centos
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.12.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.4
Can you try the latest version of ORT? We've not seen reports of this behavior. So, detailed instructions on how to repro will be required.
Using the code from the latest main today, I could not reproduce this issue.
seems similar with #2804 #10352
Hello, I'm a member of MaaAssistantArknights, and it occurs on our program as the same.
Onnxruntime version: 1.15.1 with prebuild https://github.com/microsoft/onnxruntime/releases/download/v1.15.1/onnxruntime-linux-x64-gpu-1.15.1.tgz
Exception:
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what(): /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 4: driver shutting down ; GPU=2000772548 ; hostname=Cryolitia-nixos ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=99 ; expr=cudaFreeHost(p);
core dump:
#0 0x00007f31a856fd7c __pthread_kill_implementation (libc.so.6 + 0x8cd7c)
#1 0x00007f31a85209c6 raise (libc.so.6 + 0x3d9c6)
#2 0x00007f31a85098fa abort (libc.so.6 + 0x268fa)
#3 0x00007f31a56a9a89 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold (libstdc++.so.6 + 0xa9a89)
#4 0x00007f31a56b4f8a _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6 + 0xb4f8a)
#5 0x00007f31a56b3ff9 __cxa_call_terminate (libstdc++.so.6 + 0xb3ff9)
#6 0x00007f31a56b4716 __gxx_personality_v0 (libstdc++.so.6 + 0xb4716)
#7 0x00007f31a87c2864 _Unwind_RaiseException_Phase2 (libgcc_s.so.1 + 0x17864)
#8 0x00007f31a87c32bd _Unwind_Resume (libgcc_s.so.1 + 0x182bd)
#9 0x00007f31134e1364 _ZN11onnxruntime8CudaCallI9cudaErrorLb1EEENSt11conditionalIXT0_EvNS_6common6StatusEE4typeET_PKcS9_S7_S9_S9_i (libonnxruntime_providers_cuda.so + 0xe1364)
#10 0x00007f31134dd91b _ZN11onnxruntime19CUDAPinnedAllocator4FreeEPv (libonnxruntime_providers_cuda.so + 0xdd91b)
#11 0x00007f31a7172d7d n/a (libonnxruntime.so.1.15.1 + 0x972d7d)
#12 0x00007f31a7172f3d n/a (libonnxruntime.so.1.15.1 + 0x972f3d)
#13 0x00007f31134eebe2 _ZN11onnxruntime21CUDAExecutionProviderD1Ev (libonnxruntime_providers_cuda.so + 0xeebe2)
#14 0x00007f31134eed1d _ZN11onnxruntime21CUDAExecutionProviderD0Ev (libonnxruntime_providers_cuda.so + 0xeed1d)
#15 0x00007f31a6a72b8a n/a (libonnxruntime.so.1.15.1 + 0x272b8a)
#16 0x00007f31a6a72d7d n/a (libonnxruntime.so.1.15.1 + 0x272d7d)
#17 0x00007f31a7b31ddd _ZN10fastdeploy10OrtBackendD1Ev (libMaaDerpLearning.so + 0x131ddd)
#18 0x00007f31a7b31e69 _ZN10fastdeploy10OrtBackendD0Ev (libMaaDerpLearning.so + 0x131e69)
#19 0x00007f31a7b27105 _ZN10fastdeploy7RuntimeD2Ev (libMaaDerpLearning.so + 0x127105)
#20 0x00007f31a7b273d2 _ZNSt15_Sp_counted_ptrIPN10fastdeploy7RuntimeELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libMaaDerpLearning.so + 0x1273d2)
#21 0x00007f31a8188859 _ZN10fastdeploy15FastDeployModelD1Ev (libMaaCore.so + 0x188859)
For more technical details:
- we use fastdeploy_ppocr in https://github.com/MaaAssistantArknights/MaaAssistantArknights/blob/0ae92d0de5f83a231d906f8e18ad99764ebab67e/src/MaaCore/Config/Miscellaneous/OcrPack.cpp#L124 , create two instances of
fastdeploy::Runtime
. - Each
fastdeploy::Runtime
creates aOrt::Session
in https://github.com/MaaAssistantArknights/FastDeploy/blob/master/fastdeploy/backends/ort/ort_backend.cc - When the program exits 0 normally, occurs
driver shutting down
Could it be caused by that, each Ort::Session
instance owns a instance of cuda driver but the cuda driver was shut down globally when the first instance destructed, and the second instance tries to shut down a already-shut-down cuda driver.
Meet the same problem. Program ends with:
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException' what(): /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 4: driver shutting down ; GPU=-2130784471 ; hostname=dev-audioaihcb1 ; expr=cudaEventSynchronize(e);
onnxruntime version is onnxruntime-linux-x64-gpu-1.12.0
onnxruntime-linux-x64-gpu-1.16.3 meets the same problem.
Debugging with breakpoints on cudaFreeHost
and cudaMallocHost
Long text
(gdb) run
Starting program: /usr/bin/maa run main
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[2024-02-21 11:27:57 WARN ] Hot update resource directory not found!
[New Thread 0x7fff58e006c0 (LWP 22641)]
[New Thread 0x7fff584006c0 (LWP 22642)]
[Thread 0x7fff584006c0 (LWP 22642) exited]
[New Thread 0x7fff57a006c0 (LWP 22643)]
[New Thread 0x7fff570006c0 (LWP 22644)]
[New Thread 0x7fff566006c0 (LWP 22645)]
[Detaching after fork from child process 22646]
[Detaching after fork from child process 22651]
[Detaching after fork from child process 22658]
[Detaching after fork from child process 22666]
[Detaching after fork from child process 22667]
[Detaching after fork from child process 22669]
[Detaching after fork from child process 22673]
[Detaching after fork from child process 22692]
[Detaching after fork from child process 22702]
[New Thread 0x7fff4fe006c0 (LWP 22712)]
[New Thread 0x7fff4f8006c0 (LWP 22713)]
[New Thread 0x7fff4f2006c0 (LWP 22714)]
[New Thread 0x7fff4e6006c0 (LWP 22716)]
[New Thread 0x7fff4ec006c0 (LWP 22715)]
[New Thread 0x7fff4e0006c0 (LWP 22717)]
[New Thread 0x7fff4da006c0 (LWP 22718)]
[New Thread 0x7fff4ce006c0 (LWP 22720)]
[New Thread 0x7fff4d4006c0 (LWP 22719)]
[New Thread 0x7fff47e006c0 (LWP 22721)]
[New Thread 0x7fff4c8006c0 (LWP 22722)]
[Detaching after fork from child process 22723]
[Detaching after fork from child process 22733]
[Detaching after fork from child process 22758]
[Detaching after fork from child process 22769]
[New Thread 0x7fff556006c0 (LWP 22780)]
[New Thread 0x7fff54c006c0 (LWP 22784)]
[New Thread 0x7fff3ec006c0 (LWP 22785)]
[New Thread 0x7fff3e2006c0 (LWP 22786)]
[New Thread 0x7fff3d8006c0 (LWP 22787)]
[New Thread 0x7fff3ce006c0 (LWP 22788)]
[New Thread 0x7fff37e006c0 (LWP 22789)]
[New Thread 0x7fff374006c0 (LWP 22790)]
[New Thread 0x7fff36a006c0 (LWP 22791)]
[New Thread 0x7fff360006c0 (LWP 22792)]
[New Thread 0x7fff356006c0 (LWP 22793)]
[New Thread 0x7fff34c006c0 (LWP 22794)]
[New Thread 0x7fff2fe006c0 (LWP 22795)]
[New Thread 0x7fff2f4006c0 (LWP 22796)]
[Switching to Thread 0x7fff566006c0 (LWP 22645)]
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff4292ebc0, num_bytes=24, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor
(this=this@entry=0x7fff434252d0, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 3, weak count 0) = {...}, strides=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$73 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff429315e0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff429315e0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff429315e0, num_bytes=24, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor
(this=this@entry=0x7fff43416c60, p_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 3, weak count 0) = {...}, strides=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$74 = {_vptr.IAllocator = 0x7ffff5ac1b58 <vtable for onnxruntime::CPUAllocator+16>, memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=75264) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=75264) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff4292ebc0, num_bytes=75264, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor
(this=this@entry=0x7fff42878fc0, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 58, weak count 0) = {...}, strides=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$75 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
[New Thread 0x7fff272006c0 (LWP 22797)]
[New Thread 0x7fff268006c0 (LWP 22798)]
[New Thread 0x7fff25e006c0 (LWP 22799)]
[New Thread 0x7fff254006c0 (LWP 22800)]
[New Thread 0x7fff24a006c0 (LWP 22801)]
[New Thread 0x7fff1fe006c0 (LWP 22802)]
[New Thread 0x7fff1f4006c0 (LWP 22803)]
[New Thread 0x7fff1ea006c0 (LWP 22804)]
[New Thread 0x7fff1e0006c0 (LWP 22805)]
[New Thread 0x7fff1d6006c0 (LWP 22806)]
[New Thread 0x7fff1cc006c0 (LWP 22807)]
2024-02-21 11:28:14.625151011 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-02-21 11:28:14.625177901 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff42f64d10, num_bytes=4, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor
(this=this@entry=0x7fff42f63b50, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 3, weak count 0) = {...}, strides=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$76 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff42f658d0, num_bytes=4, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor
(this=this@entry=0x7fff42e4f2c0, p_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 3, weak count 0) = {...}, strides=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$77 = {_vptr.IAllocator = 0x7ffff5ac1b58 <vtable for onnxruntime::CPUAllocator+16>, memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff42f64d10, num_bytes=1179648, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor
(this=this@entry=0x7fff42d447f0, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 19, weak count 0) = {...}, strides=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$78 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff42f658d0, num_bytes=1179648, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor
(this=this@entry=0x7ffd785f1a30, p_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 5, weak count 0) = {...}, strides=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$79 = {_vptr.IAllocator = 0x7ffff5ac1b58 <vtable for onnxruntime::CPUAllocator+16>, memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1048576) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1048576) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff42f64d10, num_bytes=1048576, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor
(this=this@entry=0x7fff42dae600, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 36, weak count 0) = {...}, strides=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$80 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff42f64d10, num_bytes=2359296, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor
(this=this@entry=0x7ffd785296b0, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 58, weak count 0) = {...}, strides=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$81 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff42f658d0, num_bytes=2359296, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor
(this=this@entry=0x7ffd78529b70, p_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 7, weak count 0) = {...}, strides=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$82 = {_vptr.IAllocator = 0x7ffff5ac1b58 <vtable for onnxruntime::CPUAllocator+16>, memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=2715648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=2715648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=2715648, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff4292ebc0, size=2715648, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3 0x00007ffff5268eb5 in onnxruntime::utils::AllocateHelper (target_mlvalue=..., source_mlvalue=..., target_stream=0x7fff40774250, allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 150, weak count 0) = {...})
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/utils.cc:91
$83 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=1810432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=1810432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=1810432, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff4292ebc0, size=size@entry=1810432, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3 0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper
(this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=3, element_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, location=..., shape=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587
$84 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=7241728) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=7241728) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=7241728, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff4292ebc0, size=size@entry=7241728, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3 0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper
(this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=328, element_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, location=..., shape=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587
$85 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=5431296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=5431296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=5431296, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff4292ebc0, size=size@entry=5431296, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3 0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper
(this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=244, element_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, location=..., shape=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587
$86 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42931000, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42931000, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff42931000, num_bytes=32, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff4a0afe8 in onnxruntime::ProviderHostImpl::Allocator__AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=<optimized out>, allocator=..., size=32, use_reserve=<optimized out>, stream=0x0, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/provider_bridge_ort.cc:1057
$87 = {_vptr.IAllocator = 0x7ffee61332e8 <vtable for onnxruntime::CUDAPinnedAllocator+16>, memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000',
static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 5.2, 0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12
(gdb) fin
Run till exit from #0 0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12
0x00007ffeb34b3a89 in onnxruntime::CUDAPinnedAllocator::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:92
92 CUDA_CALL_THROW(cudaMallocHost((void**)&p, size));
(gdb) print p
$88 = (void *) 0x7ffddca00600
(gdb) continue
Continuing.
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=33554432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=33554432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=33554432, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff4292ebc0, size=size@entry=33554432, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3 0x00007ffff51b2bc4 in onnxruntime::AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(alloc=..., size=33554432, use_reserve=<optimized out>, stream=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/allocator.cc:121
$89 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=24320) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=24320) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff42f64d10, num_bytes=num_bytes@entry=24192, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7ffd07161090, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff42f64d10, size=size@entry=24192, current_stream=current_stream@entry=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3 0x00007ffff51b2bc4 in onnxruntime::AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(alloc=..., size=24192, use_reserve=<optimized out>, stream=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/allocator.cc:121
$90 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=5898240) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=5898240) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff42f64d10, num_bytes=num_bytes@entry=5898240, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7ffd07161090, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff42f64d10, size=size@entry=5898240, current_stream=current_stream@entry=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3 0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper
(this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=35, element_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, location=..., shape=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587
$91 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f652f0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f652f0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff42f652f0, num_bytes=16, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff4a0afe8 in onnxruntime::ProviderHostImpl::Allocator__AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=<optimized out>, allocator=..., size=16, use_reserve=<optimized out>, stream=0x0, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/provider_bridge_ort.cc:1057
$92 = {_vptr.IAllocator = 0x7ffee61332e8 <vtable for onnxruntime::CUDAPinnedAllocator+16>, memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000',
static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 5.2, 0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12
(gdb) fin
Run till exit from #0 0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12
0x00007ffeb34b3a89 in onnxruntime::CUDAPinnedAllocator::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:92
92 CUDA_CALL_THROW(cudaMallocHost((void**)&p, size));
(gdb) print p
$93 = (void *) 0x7ffd19200000
(gdb) continue
Continuing.
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=34504704) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=34504704) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff42f64d10, num_bytes=num_bytes@entry=34504704, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7ffd07161090, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=this@entry=0x7fff42f64d10, size=size@entry=34504704, current_stream=current_stream@entry=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3 0x00007ffff51b2bc4 in onnxruntime::AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(alloc=..., size=34504704, use_reserve=<optimized out>, stream=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/allocator.cc:121
$94 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=6626048) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99 Status BFCArena::Extend(size_t rounded_bytes) {
#0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=6626048) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
(this=0x7fff42f658d0, num_bytes=6625920, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor
(this=0x7ffd1db128d0, p_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 10, weak count 0) = {...}, strides=...)
at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$95 = {_vptr.IAllocator = 0x7ffff5ac1b58 <vtable for onnxruntime::CPUAllocator+16>, memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}}
[Detaching after fork from child process 22814]
[Detaching after fork from child process 22825]
[Detaching after fork from child process 22836]
[Detaching after fork from child process 22846]
[Detaching after fork from child process 22857]
^C
Thread 1 "maa" received signal SIGINT, Interrupt.
[Switching to Thread 0x7ffff7423980 (LWP 22640)]
0x00007ffff7606335 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7fffffffcfa0, rem=0x7fffffffcfa0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
48 r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req,
(gdb) signal SIGINT
Continuing with signal SIGINT.
[Detaching after fork from child process 22870]
[Thread 0x7fff58e006c0 (LWP 22641) exited]
[Thread 0x7fff570006c0 (LWP 22644) exited]
[Thread 0x7fff57a006c0 (LWP 22643) exited]
[Thread 0x7fff566006c0 (LWP 22645) exited]
Summary
----------------------------------------
[StartUp] 11:28:05 - 11:29:04 (58s) Completed
----------------------------------------
[Infrast] 11:29:04 - Unfinished
----------------------------------------
[Recruit] Unstarted
----------------------------------------
[Mall] Unstarted
----------------------------------------
[Award] Unstarted
Error: Interrupted by user!
[Thread 0x7fff4e6006c0 (LWP 22716) exited]
[Thread 0x7fff4c8006c0 (LWP 22722) exited]
[Thread 0x7fff4f8006c0 (LWP 22713) exited]
[Thread 0x7fff47e006c0 (LWP 22721) exited]
[Thread 0x7fff4d4006c0 (LWP 22719) exited]
[Thread 0x7fff4e0006c0 (LWP 22717) exited]
[Thread 0x7fff4f2006c0 (LWP 22714) exited]
[Thread 0x7fff4fe006c0 (LWP 22712) exited]
[Thread 0x7fff4da006c0 (LWP 22718) exited]
[Thread 0x7fff4ce006c0 (LWP 22720) exited]
[Thread 0x7fff4ec006c0 (LWP 22715) exited]
[Thread 0x7fff24a006c0 (LWP 22801) exited]
[Thread 0x7fff254006c0 (LWP 22800) exited]
[Thread 0x7fff25e006c0 (LWP 22799) exited]
[Thread 0x7fff268006c0 (LWP 22798) exited]
[Thread 0x7fff272006c0 (LWP 22797) exited]
[Thread 0x7fff1cc006c0 (LWP 22807) exited]
[Thread 0x7fff1d6006c0 (LWP 22806) exited]
[Thread 0x7fff1e0006c0 (LWP 22805) exited]
[Thread 0x7fff1ea006c0 (LWP 22804) exited]
[Thread 0x7fff1f4006c0 (LWP 22803) exited]
[Thread 0x7fff1fe006c0 (LWP 22802) exited]
Thread 1 "maa" hit Breakpoint 3.1, onnxruntime::BFCArena::~BFCArena (this=this@entry=0x7fff42f64d10, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
80 BFCArena::~BFCArena() {
#0 onnxruntime::BFCArena::~BFCArena (this=this@entry=0x7fff42f64d10, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
#1 0x00007ffff51b403c in onnxruntime::StreamAwareArena::~StreamAwareArena (this=0x7fff42f64d10, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.h:520
#2 onnxruntime::StreamAwareArena::~StreamAwareArena (this=0x7fff42f64d10, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.h:520
#3 0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42db59d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346
$96 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001',
static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
Thread 1 "maa" hit Breakpoint 3.2, onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
80 BFCArena::~BFCArena() {
#0 onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
#1 0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346
#2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:317
#3 0x00007ffff4a5f48e in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff42d8b380, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:1071
$97 = {_vptr.IAllocator = 0x7ffee61332e8 <vtable for onnxruntime::CUDAPinnedAllocator+16>, memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000',
static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}}
Thread 1 "maa" hit Breakpoint 3.1, onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
80 BFCArena::~BFCArena() {
#0 onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
#1 0x00007ffff51b3fee in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:92
#2 0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346
#3 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:317
$98 = {_vptr.IAllocator = 0x7ffee61332e8 <vtable for onnxruntime::CUDAPinnedAllocator+16>, memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000',
static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}}
Thread 1 "maa" hit Breakpoint 6, 0x00007fffee456a84 in cudaFreeHost () from /opt/cuda/lib64/libcudart.so.12
(gdb) bt 3
#0 0x00007fffee456a84 in cudaFreeHost () at /opt/cuda/lib64/libcudart.so.12
#1 0x00007ffeb34b3ae1 in onnxruntime::CUDAPinnedAllocator::Free (this=<optimized out>, p=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:98
#2 0x00007ffff51b3e3d in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:82
(More stack frames follow...)
(gdb) info args
No symbol table info available.
(gdb) info registers
rax 0x7ffee61332e8 140732758438632
rbx 0x7ffd1e5e0a98 140725112933016
rcx 0x0 0
rdx 0x100000001 4294967297
rsi 0x7ffd19200000 140725024980992
rdi 0x7ffd19200000 140725024980992
rbp 0x7fffffffd990 0x7fffffffd990
rsp 0x7fffffffd990 0x7fffffffd990
r8 0x7fff42d8b 34358963595
r9 0x7 7
r10 0x7fff42d8b2d0 140734314885840
r11 0x8e177c1ee1a1c7b3 -8207955323783100493
r12 0x7fff42f652f0 140734316827376
r13 0x7fff42d9c187 140734314955143
r14 0x1ff 511
r15 0x7ffd1d048770 140725090289520
rip 0x7fffee456a84 0x7fffee456a84 <cudaFreeHost+4>
eflags 0x206 [ PF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
fs_base 0x7ffff7423980 140737341700480
gs_base 0x0 0
(gdb) continue
Continuing.
Thread 1 "maa" hit Catchpoint 4 (exception thrown), 0x00007ffff18b03b1 in __cxxabiv1::__cxa_throw (obj=0x555556457650, tinfo=0x7ffee6131eb8 <typeinfo for onnxruntime::OnnxRuntimeException>,
dest=0x7ffeb34a0cb0 <onnxruntime::OnnxRuntimeException::~OnnxRuntimeException()>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:81
81 PROBE2 (throw, obj, tinfo);
(gdb) bt
#0 0x00007ffff18b03b1 in __cxxabiv1::__cxa_throw (obj=0x555556457650, tinfo=0x7ffee6131eb8 <typeinfo for onnxruntime::OnnxRuntimeException>, dest=0x7ffeb34a0cb0 <onnxruntime::OnnxRuntimeException::~OnnxRuntimeException()>)
at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:81
#1 0x00007ffeb34b60f4 in onnxruntime::CudaCall<cudaError, true>
(retCode=<optimized out>, exprString=exprString@entry=0x7ffeb403524f "cudaFreeHost(p)", libName=libName@entry=0x7ffeb4035141 "CUDA", successCode=successCode@entry=cudaSuccess, msg=msg@entry=0x7ffeb40350fd "", file=file@entry=0x7ffeb403a730 "/usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc", line=98) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/new_allocator.h:90
#2 0x00007ffeb34b3b0d in onnxruntime::CUDAPinnedAllocator::Free (this=<optimized out>, p=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:98
#3 0x00007ffff51b3e3d in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:82
#4 0x00007ffff51b3fee in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:92
#5 0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346
#6 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:317
#7 0x00007ffff4a5f48e in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff42d8b380, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:1071
#8 std::__shared_ptr<onnxruntime::IAllocator, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fff42d8b378, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:1524
#9 std::shared_ptr<onnxruntime::IAllocator>::~shared_ptr (this=0x7fff42d8b378, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr.h:175
#10 std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> >::~pair (this=0x7fff42d8b370, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_pair.h:185
#11 std::__new_allocator<std::_Rb_tree_node<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::destroy<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > (__p=0x7fff42d8b370, this=<optimized out>)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/new_allocator.h:181
#12 std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > > >::destroy<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > (__p=0x7fff42d8b370, __a=<optimized out>)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/alloc_traits.h:535
#13 std::_Rb_tree<OrtDevice, std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> >, std::_Select1st<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > >, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::_M_destroy_node (__p=0x7fff42d8b350, this=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:625
#14 std::_Rb_tree<OrtDevice, std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> >, std::_Select1st<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > >, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::_M_drop_node (this=<optimized out>, __p=0x7fff42d8b350) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:633
#15 std::_Rb_tree<OrtDevice, std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> >, std::_Select1st<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > >, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::_M_erase (__x=0x7fff42d8b350, this=0x7fff42d8b250) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:1939
#16 0x00007ffff4a7ea5a in std::_Rb_tree<OrtDevice, std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> >, std::_Select1st<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > >, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::~_Rb_tree (this=0x7fff42d8b250, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:736
#17 std::map<OrtDevice, std::shared_ptr<onnxruntime::IAllocator>, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::~map (this=0x7fff42d8b250, __in_chrg=<optimized out>)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_map.h:312
#18 std::default_delete<std::map<OrtDevice, std::shared_ptr<onnxruntime::IAllocator>, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > > >::operator() (this=<optimized out>, __ptr=0x7fff42d8b250)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:95
#19 std::default_delete<std::map<OrtDevice, std::shared_ptr<onnxruntime::IAllocator>, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > > >::operator() (__ptr=0x7fff42d8b250, this=<optimized out>)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:89
#20 std::unique_ptr<std::map<OrtDevice, std::shared_ptr<onnxruntime::IAllocator>, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >, std::default_delete<std::map<OrtDevice, std::shared_ptr<onnxruntime::IAllocator>, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > > > >::~unique_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:396
#21 onnxruntime::SessionState::~SessionState (this=0x7fff42f64610, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/session_state.h:109
#22 0x00007ffff4a81a91 in std::default_delete<onnxruntime::SessionState>::operator() (this=<optimized out>, __ptr=0x7fff42f64610) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:89
#23 std::default_delete<onnxruntime::SessionState>::operator() (__ptr=0x7fff42f64610, this=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:89
#24 std::unique_ptr<onnxruntime::SessionState, std::default_delete<onnxruntime::SessionState> >::~unique_ptr (this=0x7fff42da9138, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:396
#25 onnxruntime::InferenceSession::~InferenceSession (this=0x7fff42da8ae0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/inference_session.cc:530
#26 0x00007ffff4a81dee in onnxruntime::InferenceSession::~InferenceSession (this=0x7fff42da8ae0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/inference_session.cc:530
#27 0x00007ffff5d3c568 in Ort::detail::OrtRelease (ptr=<optimized out>) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:124
#28 Ort::detail::Base<OrtSession>::~Base (this=0x7fff42a1c6c8, __in_chrg=<optimized out>) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:561
#29 Ort::detail::ConstSessionImpl<OrtSession>::~ConstSessionImpl (this=0x7fff42a1c6c8, __in_chrg=<optimized out>) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:994
#30 Ort::detail::SessionImpl<OrtSession>::~SessionImpl (this=0x7fff42a1c6c8, __in_chrg=<optimized out>) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:1038
#31 Ort::Session::~Session (this=0x7fff42a1c6c8, __in_chrg=<optimized out>) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:1109
#32 fastdeploy::OrtBackend::~OrtBackend (this=0x7fff42a1c6b0, __in_chrg=<optimized out>) at /usr/src/debug/maa-assistant-arknights/FastDeploy-d0b018ac6c3daa22c7b55b555dc927a5c734d430/fastdeploy/backends/ort/ort_backend.h:57
#33 0x00007ffff5d3c5fe in fastdeploy::OrtBackend::~OrtBackend (this=0x7fff42a1c6b0, __in_chrg=<optimized out>) at /usr/src/debug/maa-assistant-arknights/FastDeploy-d0b018ac6c3daa22c7b55b555dc927a5c734d430/fastdeploy/backends/ort/ort_backend.h:57
#34 0x00007ffff6b97526 in std::default_delete<fastdeploy::BaseBackend>::operator() (this=0x7fff42a1ec78, __ptr=0x7fff42a1c6b0) at /usr/include/c++/13.2.1/bits/unique_ptr.h:99
#35 0x00007ffff6b95542 in std::unique_ptr<fastdeploy::BaseBackend, std::default_delete<fastdeploy::BaseBackend> >::~unique_ptr (this=0x7fff42a1ec78, __in_chrg=<optimized out>) at /usr/include/c++/13.2.1/bits/unique_ptr.h:404
#36 0x00007ffff6b979aa in fastdeploy::Runtime::~Runtime (this=0x7fff42a1e950, __in_chrg=<optimized out>) at /home/arch/projects/MaaAssistantArknights/usr/include/fastdeploy/runtime.h:458
#37 0x00007ffff5d2e157 in std::_Sp_counted_ptr<fastdeploy::Runtime*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=<optimized out>) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:428
#38 0x00007ffff6ae90b1 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42925d80) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:346
#39 0x00007ffff6af0897 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff42da8a40, __in_chrg=<optimized out>) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:1071
#40 0x00007ffff6b94460 in std::__shared_ptr<fastdeploy::Runtime, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fff42da8a38, __in_chrg=<optimized out>) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:1524
#41 0x00007ffff6b9447c in std::shared_ptr<fastdeploy::Runtime>::~shared_ptr (this=0x7fff42da8a38, __in_chrg=<optimized out>) at /usr/include/c++/13.2.1/bits/shared_ptr.h:175
--Type <RET> for more, q to quit, c to continue without paging--
#42 0x00007ffff6b944c2 in fastdeploy::FastDeployModel::~FastDeployModel (this=0x7fff42da8640, __in_chrg=<optimized out>) at /home/arch/projects/MaaAssistantArknights/usr/include/fastdeploy/fastdeploy_model.h:21
#43 0x00007ffff6b97eee in fastdeploy::vision::ocr::Recognizer::~Recognizer (this=0x7fff42da8640, __in_chrg=<optimized out>) at /home/arch/projects/MaaAssistantArknights/usr/include/fastdeploy/vision/ocr/ppocr/recognizer.h:31
#44 0x00007ffff6b97f14 in std::default_delete<fastdeploy::vision::ocr::Recognizer>::operator() (this=0x7ffff7229398 <asst::SingletonHolder<asst::WordOcr>::get_instance()::unique_instance+24>, __ptr=0x7fff42da8640) at /usr/include/c++/13.2.1/bits/unique_ptr.h:99
#45 0x00007ffff6b95fcc in std::unique_ptr<fastdeploy::vision::ocr::Recognizer, std::default_delete<fastdeploy::vision::ocr::Recognizer> >::~unique_ptr
(this=0x7ffff7229398 <asst::SingletonHolder<asst::WordOcr>::get_instance()::unique_instance+24>, __in_chrg=<optimized out>) at /usr/include/c++/13.2.1/bits/unique_ptr.h:404
#46 0x00007ffff6b9183a in asst::OcrPack::~OcrPack (this=0x7ffff7229388 <asst::SingletonHolder<asst::WordOcr>::get_instance()::unique_instance+8>, __in_chrg=<optimized out>) at /home/arch/projects/MaaAssistantArknights/src/MaaCore/Config/Miscellaneous/OcrPack.cpp:27
#47 0x00007ffff6af42af in asst::WordOcr::~WordOcr (this=0x7ffff7229380 <asst::SingletonHolder<asst::WordOcr>::get_instance()::unique_instance>, __in_chrg=<optimized out>) at /home/arch/projects/MaaAssistantArknights/src/MaaCore/Config/Miscellaneous/OcrPack.h:63
#48 0x00007ffff7570b36 in __run_exit_handlers (status=1, listp=0x7ffff770a680 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#49 0x00007ffff7570c80 in __GI_exit (status=<optimized out>) at exit.c:138
#50 0x00007ffff7557cd7 in __libc_start_call_main (main=main@entry=0x5555556e2e30 <main>, argc=argc@entry=3, argv=argv@entry=0x7fffffffe218) at ../sysdeps/nptl/libc_start_call_main.h:74
#51 0x00007ffff7557d8a in __libc_start_main_impl (main=0x5555556e2e30 <main>, argc=3, argv=0x7fffffffe218, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe208) at ../csu/libc-start.c:360
#52 0x00005555555d0205 in _start ()
(gdb)
Note that 0x7ffd19200000
allocated by cudaMallocHost
failed to be freed in cudaFreeHost
when the program terminates and an exception were thrown. I have no idea how this could happen :thinking:
Hello, I'm a member of MaaAssistantArknights, and it occurs on our program as the same.
Onnxruntime version: 1.15.1 with prebuild https://github.com/microsoft/onnxruntime/releases/download/v1.15.1/onnxruntime-linux-x64-gpu-1.15.1.tgz
Exception:
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException' what(): /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 4: driver shutting down ; GPU=2000772548 ; hostname=Cryolitia-nixos ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=99 ; expr=cudaFreeHost(p);
core dump:
#0 0x00007f31a856fd7c __pthread_kill_implementation (libc.so.6 + 0x8cd7c) #1 0x00007f31a85209c6 raise (libc.so.6 + 0x3d9c6) #2 0x00007f31a85098fa abort (libc.so.6 + 0x268fa) #3 0x00007f31a56a9a89 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold (libstdc++.so.6 + 0xa9a89) #4 0x00007f31a56b4f8a _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6 + 0xb4f8a) #5 0x00007f31a56b3ff9 __cxa_call_terminate (libstdc++.so.6 + 0xb3ff9) #6 0x00007f31a56b4716 __gxx_personality_v0 (libstdc++.so.6 + 0xb4716) #7 0x00007f31a87c2864 _Unwind_RaiseException_Phase2 (libgcc_s.so.1 + 0x17864) #8 0x00007f31a87c32bd _Unwind_Resume (libgcc_s.so.1 + 0x182bd) #9 0x00007f31134e1364 _ZN11onnxruntime8CudaCallI9cudaErrorLb1EEENSt11conditionalIXT0_EvNS_6common6StatusEE4typeET_PKcS9_S7_S9_S9_i (libonnxruntime_providers_cuda.so + 0xe1364) #10 0x00007f31134dd91b _ZN11onnxruntime19CUDAPinnedAllocator4FreeEPv (libonnxruntime_providers_cuda.so + 0xdd91b) #11 0x00007f31a7172d7d n/a (libonnxruntime.so.1.15.1 + 0x972d7d) #12 0x00007f31a7172f3d n/a (libonnxruntime.so.1.15.1 + 0x972f3d) #13 0x00007f31134eebe2 _ZN11onnxruntime21CUDAExecutionProviderD1Ev (libonnxruntime_providers_cuda.so + 0xeebe2) #14 0x00007f31134eed1d _ZN11onnxruntime21CUDAExecutionProviderD0Ev (libonnxruntime_providers_cuda.so + 0xeed1d) #15 0x00007f31a6a72b8a n/a (libonnxruntime.so.1.15.1 + 0x272b8a) #16 0x00007f31a6a72d7d n/a (libonnxruntime.so.1.15.1 + 0x272d7d) #17 0x00007f31a7b31ddd _ZN10fastdeploy10OrtBackendD1Ev (libMaaDerpLearning.so + 0x131ddd) #18 0x00007f31a7b31e69 _ZN10fastdeploy10OrtBackendD0Ev (libMaaDerpLearning.so + 0x131e69) #19 0x00007f31a7b27105 _ZN10fastdeploy7RuntimeD2Ev (libMaaDerpLearning.so + 0x127105) #20 0x00007f31a7b273d2 _ZNSt15_Sp_counted_ptrIPN10fastdeploy7RuntimeELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libMaaDerpLearning.so + 0x1273d2) #21 0x00007f31a8188859 _ZN10fastdeploy15FastDeployModelD1Ev (libMaaCore.so + 0x188859)
For more technical details:
- we use fastdeploy_ppocr in https://github.com/MaaAssistantArknights/MaaAssistantArknights/blob/0ae92d0de5f83a231d906f8e18ad99764ebab67e/src/MaaCore/Config/Miscellaneous/OcrPack.cpp#L124 , create two instances of
fastdeploy::Runtime
.- Each
fastdeploy::Runtime
creates aOrt::Session
in https://github.com/MaaAssistantArknights/FastDeploy/blob/master/fastdeploy/backends/ort/ort_backend.cc- When the program exits 0 normally, occurs
driver shutting down
Could it be caused by that, each
Ort::Session
instance owns a instance of cuda driver but the cuda driver was shut down globally when the first instance destructed, and the second instance tries to shut down a already-shut-down cuda driver.
I also encountered a similar problem, ORT should have a global variable inside, which was released early, resulting in the corresponding data can not be found when cudaFreeHost.
i meet the same error and solve it now it occurs when another gpu-task occupies the GPU and gpu memery is not enough (need 1100MB while only 800MB remains)