onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

A bug occurs when the program terminates

Open busishengui opened this issue 1 year ago • 7 comments

Describe the issue

It works well when it run in GPU,but it has a bug when it terminates terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException' what(): /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 4: driver shutting down ; GPU=806358777 ; hostname=lv-voice-rt-02 ; expr=cudaEventSynchronize(e); image

To reproduce

	auto env = std::make_shared<Ort::Env>(ORT_LOGGING_LEVEL_WARNING, "RNNT-model");
	auto session_options = std::make_shared< Ort::SessionOptions>();
	session_options->SetInterOpNumThreads(1);
	session_options->SetIntraOpNumThreads(1);
	session_options->DisableCpuMemArena();
	session_options->SetGraphOptimizationLevel(ORT_ENABLE_ALL); 
	auto options = std::make_shared<OrtCUDAProviderOptions>();
	options->device_id = 0; 
	options->arena_extend_strategy = 1;
	options->cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::OrtCudnnConvAlgoSearchDefault;
	options->do_copy_in_default_stream = -1;
	options->default_memory_arena_cfg = nullptr;
	session_options->AppendExecutionProvider_CUDA(*options);

Urgency

No response

Platform

Linux

OS Version

centos

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.12.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.4

busishengui avatar Mar 23 '23 06:03 busishengui

Can you try the latest version of ORT? We've not seen reports of this behavior. So, detailed instructions on how to repro will be required.

pranavsharma avatar Mar 29 '23 00:03 pranavsharma

Using the code from the latest main today, I could not reproduce this issue.

satyajandhyala avatar Apr 26 '23 23:04 satyajandhyala

seems similar with #2804 #10352

Cryolitia avatar Nov 08 '23 12:11 Cryolitia

Hello, I'm a member of MaaAssistantArknights, and it occurs on our program as the same.

Onnxruntime version: 1.15.1 with prebuild https://github.com/microsoft/onnxruntime/releases/download/v1.15.1/onnxruntime-linux-x64-gpu-1.15.1.tgz

Exception:

terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
  what():  /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 4: driver shutting down ; GPU=2000772548 ; hostname=Cryolitia-nixos ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=99 ; expr=cudaFreeHost(p); 

core dump:

                #0  0x00007f31a856fd7c __pthread_kill_implementation (libc.so.6 + 0x8cd7c)
                #1  0x00007f31a85209c6 raise (libc.so.6 + 0x3d9c6)
                #2  0x00007f31a85098fa abort (libc.so.6 + 0x268fa)
                #3  0x00007f31a56a9a89 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold (libstdc++.so.6 + 0xa9a89)
                #4  0x00007f31a56b4f8a _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6 + 0xb4f8a)
                #5  0x00007f31a56b3ff9 __cxa_call_terminate (libstdc++.so.6 + 0xb3ff9)
                #6  0x00007f31a56b4716 __gxx_personality_v0 (libstdc++.so.6 + 0xb4716)
                #7  0x00007f31a87c2864 _Unwind_RaiseException_Phase2 (libgcc_s.so.1 + 0x17864)
                #8  0x00007f31a87c32bd _Unwind_Resume (libgcc_s.so.1 + 0x182bd)
                #9  0x00007f31134e1364 _ZN11onnxruntime8CudaCallI9cudaErrorLb1EEENSt11conditionalIXT0_EvNS_6common6StatusEE4typeET_PKcS9_S7_S9_S9_i (libonnxruntime_providers_cuda.so + 0xe1364)
                #10 0x00007f31134dd91b _ZN11onnxruntime19CUDAPinnedAllocator4FreeEPv (libonnxruntime_providers_cuda.so + 0xdd91b)
                #11 0x00007f31a7172d7d n/a (libonnxruntime.so.1.15.1 + 0x972d7d)
                #12 0x00007f31a7172f3d n/a (libonnxruntime.so.1.15.1 + 0x972f3d)
                #13 0x00007f31134eebe2 _ZN11onnxruntime21CUDAExecutionProviderD1Ev (libonnxruntime_providers_cuda.so + 0xeebe2)
                #14 0x00007f31134eed1d _ZN11onnxruntime21CUDAExecutionProviderD0Ev (libonnxruntime_providers_cuda.so + 0xeed1d)
                #15 0x00007f31a6a72b8a n/a (libonnxruntime.so.1.15.1 + 0x272b8a)
                #16 0x00007f31a6a72d7d n/a (libonnxruntime.so.1.15.1 + 0x272d7d)
                #17 0x00007f31a7b31ddd _ZN10fastdeploy10OrtBackendD1Ev (libMaaDerpLearning.so + 0x131ddd)
                #18 0x00007f31a7b31e69 _ZN10fastdeploy10OrtBackendD0Ev (libMaaDerpLearning.so + 0x131e69)
                #19 0x00007f31a7b27105 _ZN10fastdeploy7RuntimeD2Ev (libMaaDerpLearning.so + 0x127105)
                #20 0x00007f31a7b273d2 _ZNSt15_Sp_counted_ptrIPN10fastdeploy7RuntimeELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libMaaDerpLearning.so + 0x1273d2)
                #21 0x00007f31a8188859 _ZN10fastdeploy15FastDeployModelD1Ev (libMaaCore.so + 0x188859)

For more technical details:

  1. we use fastdeploy_ppocr in https://github.com/MaaAssistantArknights/MaaAssistantArknights/blob/0ae92d0de5f83a231d906f8e18ad99764ebab67e/src/MaaCore/Config/Miscellaneous/OcrPack.cpp#L124 , create two instances of fastdeploy::Runtime.
  2. Each fastdeploy::Runtime creates a Ort::Session in https://github.com/MaaAssistantArknights/FastDeploy/blob/master/fastdeploy/backends/ort/ort_backend.cc
  3. When the program exits 0 normally, occurs driver shutting down

Could it be caused by that, each Ort::Session instance owns a instance of cuda driver but the cuda driver was shut down globally when the first instance destructed, and the second instance tries to shut down a already-shut-down cuda driver.

Cryolitia avatar Nov 08 '23 12:11 Cryolitia

Meet the same problem. Program ends with:

terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException' what(): /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 4: driver shutting down ; GPU=-2130784471 ; hostname=dev-audioaihcb1 ; expr=cudaEventSynchronize(e);

onnxruntime version is onnxruntime-linux-x64-gpu-1.12.0

airstillblue avatar Nov 09 '23 08:11 airstillblue

onnxruntime-linux-x64-gpu-1.16.3 meets the same problem.

LLsmile avatar Nov 21 '23 02:11 LLsmile

Debugging with breakpoints on cudaFreeHost and cudaMallocHost

Long text
(gdb) run

Starting program: /usr/bin/maa run main
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[2024-02-21 11:27:57 WARN ] Hot update resource directory not found!
[New Thread 0x7fff58e006c0 (LWP 22641)]
[New Thread 0x7fff584006c0 (LWP 22642)]
[Thread 0x7fff584006c0 (LWP 22642) exited]
[New Thread 0x7fff57a006c0 (LWP 22643)]
[New Thread 0x7fff570006c0 (LWP 22644)]
[New Thread 0x7fff566006c0 (LWP 22645)]
[Detaching after fork from child process 22646]
[Detaching after fork from child process 22651]
[Detaching after fork from child process 22658]
[Detaching after fork from child process 22666]
[Detaching after fork from child process 22667]
[Detaching after fork from child process 22669]
[Detaching after fork from child process 22673]
[Detaching after fork from child process 22692]
[Detaching after fork from child process 22702]
[New Thread 0x7fff4fe006c0 (LWP 22712)]
[New Thread 0x7fff4f8006c0 (LWP 22713)]
[New Thread 0x7fff4f2006c0 (LWP 22714)]
[New Thread 0x7fff4e6006c0 (LWP 22716)]
[New Thread 0x7fff4ec006c0 (LWP 22715)]
[New Thread 0x7fff4e0006c0 (LWP 22717)]
[New Thread 0x7fff4da006c0 (LWP 22718)]
[New Thread 0x7fff4ce006c0 (LWP 22720)]
[New Thread 0x7fff4d4006c0 (LWP 22719)]
[New Thread 0x7fff47e006c0 (LWP 22721)]
[New Thread 0x7fff4c8006c0 (LWP 22722)]
[Detaching after fork from child process 22723]
[Detaching after fork from child process 22733]
[Detaching after fork from child process 22758]
[Detaching after fork from child process 22769]
[New Thread 0x7fff556006c0 (LWP 22780)]
[New Thread 0x7fff54c006c0 (LWP 22784)]
[New Thread 0x7fff3ec006c0 (LWP 22785)]
[New Thread 0x7fff3e2006c0 (LWP 22786)]
[New Thread 0x7fff3d8006c0 (LWP 22787)]
[New Thread 0x7fff3ce006c0 (LWP 22788)]
[New Thread 0x7fff37e006c0 (LWP 22789)]
[New Thread 0x7fff374006c0 (LWP 22790)]
[New Thread 0x7fff36a006c0 (LWP 22791)]
[New Thread 0x7fff360006c0 (LWP 22792)]
[New Thread 0x7fff356006c0 (LWP 22793)]
[New Thread 0x7fff34c006c0 (LWP 22794)]
[New Thread 0x7fff2fe006c0 (LWP 22795)]
[New Thread 0x7fff2f4006c0 (LWP 22796)]
[Switching to Thread 0x7fff566006c0 (LWP 22645)]

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff4292ebc0, num_bytes=24, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff524f782 in onnxruntime::Tensor::Tensor
    (this=this@entry=0x7fff434252d0, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 3, weak count 0) = {...}, strides=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$73 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff429315e0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff429315e0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff429315e0, num_bytes=24, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff524f782 in onnxruntime::Tensor::Tensor
    (this=this@entry=0x7fff43416c60, p_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 3, weak count 0) = {...}, strides=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$74 = {_vptr.IAllocator = 0x7ffff5ac1b58 <vtable for onnxruntime::CPUAllocator+16>, memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=75264) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=75264) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff4292ebc0, num_bytes=75264, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff524f782 in onnxruntime::Tensor::Tensor
    (this=this@entry=0x7fff42878fc0, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 58, weak count 0) = {...}, strides=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$75 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}
[New Thread 0x7fff272006c0 (LWP 22797)]
[New Thread 0x7fff268006c0 (LWP 22798)]
[New Thread 0x7fff25e006c0 (LWP 22799)]
[New Thread 0x7fff254006c0 (LWP 22800)]
[New Thread 0x7fff24a006c0 (LWP 22801)]
[New Thread 0x7fff1fe006c0 (LWP 22802)]
[New Thread 0x7fff1f4006c0 (LWP 22803)]
[New Thread 0x7fff1ea006c0 (LWP 22804)]
[New Thread 0x7fff1e0006c0 (LWP 22805)]
[New Thread 0x7fff1d6006c0 (LWP 22806)]
[New Thread 0x7fff1cc006c0 (LWP 22807)]
2024-02-21 11:28:14.625151011 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-02-21 11:28:14.625177901 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff42f64d10, num_bytes=4, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff524f782 in onnxruntime::Tensor::Tensor
    (this=this@entry=0x7fff42f63b50, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 3, weak count 0) = {...}, strides=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$76 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff42f658d0, num_bytes=4, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff524f782 in onnxruntime::Tensor::Tensor
    (this=this@entry=0x7fff42e4f2c0, p_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 3, weak count 0) = {...}, strides=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$77 = {_vptr.IAllocator = 0x7ffff5ac1b58 <vtable for onnxruntime::CPUAllocator+16>, memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff42f64d10, num_bytes=1179648, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff524f782 in onnxruntime::Tensor::Tensor
    (this=this@entry=0x7fff42d447f0, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 19, weak count 0) = {...}, strides=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$78 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff42f658d0, num_bytes=1179648, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff524f782 in onnxruntime::Tensor::Tensor
    (this=this@entry=0x7ffd785f1a30, p_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 5, weak count 0) = {...}, strides=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$79 = {_vptr.IAllocator = 0x7ffff5ac1b58 <vtable for onnxruntime::CPUAllocator+16>, memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1048576) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1048576) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff42f64d10, num_bytes=1048576, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff524f782 in onnxruntime::Tensor::Tensor
    (this=this@entry=0x7fff42dae600, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 36, weak count 0) = {...}, strides=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$80 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff42f64d10, num_bytes=2359296, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff524f782 in onnxruntime::Tensor::Tensor
    (this=this@entry=0x7ffd785296b0, p_type=p_type@entry=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 58, weak count 0) = {...}, strides=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$81 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff42f658d0, num_bytes=2359296, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff524f782 in onnxruntime::Tensor::Tensor
    (this=this@entry=0x7ffd78529b70, p_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 7, weak count 0) = {...}, strides=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$82 = {_vptr.IAllocator = 0x7ffff5ac1b58 <vtable for onnxruntime::CPUAllocator+16>, memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=2715648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=2715648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=2715648, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff4292ebc0, size=2715648, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3  0x00007ffff5268eb5 in onnxruntime::utils::AllocateHelper (target_mlvalue=..., source_mlvalue=..., target_stream=0x7fff40774250, allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 150, weak count 0) = {...})
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/utils.cc:91
$83 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=1810432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=1810432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=1810432, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff4292ebc0, size=size@entry=1810432, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3  0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper
    (this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=3, element_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, location=..., shape=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587
$84 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=7241728) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=7241728) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=7241728, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff4292ebc0, size=size@entry=7241728, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3  0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper
    (this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=328, element_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, location=..., shape=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587
$85 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=5431296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=5431296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=5431296, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff4292ebc0, size=size@entry=5431296, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3  0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper
    (this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=244, element_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, location=..., shape=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587
$86 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42931000, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42931000, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff42931000, num_bytes=32, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff4a0afe8 in onnxruntime::ProviderHostImpl::Allocator__AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=<optimized out>, allocator=..., size=32, use_reserve=<optimized out>, stream=0x0, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/provider_bridge_ort.cc:1057
$87 = {_vptr.IAllocator = 0x7ffee61332e8 <vtable for onnxruntime::CUDAPinnedAllocator+16>, memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', 
      static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 5.2, 0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12

(gdb) fin

Run till exit from #0  0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12
0x00007ffeb34b3a89 in onnxruntime::CUDAPinnedAllocator::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:92
92          CUDA_CALL_THROW(cudaMallocHost((void**)&p, size));

(gdb) print p

$88 = (void *) 0x7ffddca00600
(gdb) continue 

Continuing.

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=33554432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=33554432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=33554432, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff4292ebc0, size=size@entry=33554432, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3  0x00007ffff51b2bc4 in onnxruntime::AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (alloc=..., size=33554432, use_reserve=<optimized out>, stream=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/allocator.cc:121
$89 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=24320) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=24320) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff42f64d10, num_bytes=num_bytes@entry=24192, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7ffd07161090, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff42f64d10, size=size@entry=24192, current_stream=current_stream@entry=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3  0x00007ffff51b2bc4 in onnxruntime::AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (alloc=..., size=24192, use_reserve=<optimized out>, stream=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/allocator.cc:121
$90 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=5898240) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=5898240) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff42f64d10, num_bytes=num_bytes@entry=5898240, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7ffd07161090, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff42f64d10, size=size@entry=5898240, current_stream=current_stream@entry=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3  0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper
    (this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=35, element_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, location=..., shape=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587
$91 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f652f0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f652f0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff42f652f0, num_bytes=16, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff4a0afe8 in onnxruntime::ProviderHostImpl::Allocator__AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=<optimized out>, allocator=..., size=16, use_reserve=<optimized out>, stream=0x0, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/provider_bridge_ort.cc:1057
$92 = {_vptr.IAllocator = 0x7ffee61332e8 <vtable for onnxruntime::CUDAPinnedAllocator+16>, memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', 
      static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 5.2, 0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12

(gdb) fin

Run till exit from #0  0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12
0x00007ffeb34b3a89 in onnxruntime::CUDAPinnedAllocator::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:92
92          CUDA_CALL_THROW(cudaMallocHost((void**)&p, size));

(gdb) print p

$93 = (void *) 0x7ffd19200000
(gdb) continue 

Continuing.

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=34504704) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=34504704) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff42f64d10, num_bytes=num_bytes@entry=34504704, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7ffd07161090, enable_cross_stream_reusing=<optimized out>, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=this@entry=0x7fff42f64d10, size=size@entry=34504704, current_stream=current_stream@entry=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871
#3  0x00007ffff51b2bc4 in onnxruntime::AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (alloc=..., size=34504704, use_reserve=<optimized out>, stream=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/allocator.cc:121
$94 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=6626048) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
99      Status BFCArena::Extend(size_t rounded_bytes) {
#0  onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=6626048) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99
#1  0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function<void (onnxruntime::Stream&, onnxruntime::synchronize::Notification&)>)
    (this=0x7fff42f658d0, num_bytes=6625920, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351
#2  0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=<optimized out>, size=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272
#3  0x00007ffff524f782 in onnxruntime::Tensor::Tensor
    (this=0x7ffd1db128d0, p_type=0x7ffff5b343a0 <onnxruntime::PrimitiveDataType<float>::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr<onnxruntime::IAllocator> (use count 10, weak count 0) = {...}, strides=...)
    at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72
$95 = {_vptr.IAllocator = 0x7ffff5ac1b58 <vtable for onnxruntime::CPUAllocator+16>, memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}}
[Detaching after fork from child process 22814]
[Detaching after fork from child process 22825]
[Detaching after fork from child process 22836]
[Detaching after fork from child process 22846]
[Detaching after fork from child process 22857]
^C
Thread 1 "maa" received signal SIGINT, Interrupt.
[Switching to Thread 0x7ffff7423980 (LWP 22640)]
0x00007ffff7606335 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7fffffffcfa0, rem=0x7fffffffcfa0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
48        r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req,

(gdb) signal SIGINT 

Continuing with signal SIGINT.
[Detaching after fork from child process 22870]
[Thread 0x7fff58e006c0 (LWP 22641) exited]
[Thread 0x7fff570006c0 (LWP 22644) exited]
[Thread 0x7fff57a006c0 (LWP 22643) exited]
[Thread 0x7fff566006c0 (LWP 22645) exited]
Summary
----------------------------------------
[StartUp] 11:28:05 - 11:29:04 (58s) Completed
----------------------------------------
[Infrast] 11:29:04 - Unfinished
----------------------------------------
[Recruit] Unstarted
----------------------------------------
[Mall] Unstarted
----------------------------------------
[Award] Unstarted
Error: Interrupted by user!
[Thread 0x7fff4e6006c0 (LWP 22716) exited]
[Thread 0x7fff4c8006c0 (LWP 22722) exited]
[Thread 0x7fff4f8006c0 (LWP 22713) exited]
[Thread 0x7fff47e006c0 (LWP 22721) exited]
[Thread 0x7fff4d4006c0 (LWP 22719) exited]
[Thread 0x7fff4e0006c0 (LWP 22717) exited]
[Thread 0x7fff4f2006c0 (LWP 22714) exited]
[Thread 0x7fff4fe006c0 (LWP 22712) exited]
[Thread 0x7fff4da006c0 (LWP 22718) exited]
[Thread 0x7fff4ce006c0 (LWP 22720) exited]
[Thread 0x7fff4ec006c0 (LWP 22715) exited]
[Thread 0x7fff24a006c0 (LWP 22801) exited]
[Thread 0x7fff254006c0 (LWP 22800) exited]
[Thread 0x7fff25e006c0 (LWP 22799) exited]
[Thread 0x7fff268006c0 (LWP 22798) exited]
[Thread 0x7fff272006c0 (LWP 22797) exited]
[Thread 0x7fff1cc006c0 (LWP 22807) exited]
[Thread 0x7fff1d6006c0 (LWP 22806) exited]
[Thread 0x7fff1e0006c0 (LWP 22805) exited]
[Thread 0x7fff1ea006c0 (LWP 22804) exited]
[Thread 0x7fff1f4006c0 (LWP 22803) exited]
[Thread 0x7fff1fe006c0 (LWP 22802) exited]

Thread 1 "maa" hit Breakpoint 3.1, onnxruntime::BFCArena::~BFCArena (this=this@entry=0x7fff42f64d10, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
80      BFCArena::~BFCArena() {
#0  onnxruntime::BFCArena::~BFCArena (this=this@entry=0x7fff42f64d10, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
#1  0x00007ffff51b403c in onnxruntime::StreamAwareArena::~StreamAwareArena (this=0x7fff42f64d10, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.h:520
#2  onnxruntime::StreamAwareArena::~StreamAwareArena (this=0x7fff42f64d10, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.h:520
#3  0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42db59d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346
$96 = {_vptr.IAllocator = 0x7ffee6133268 <vtable for onnxruntime::CUDAAllocator+16>, memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', 
      static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}}

Thread 1 "maa" hit Breakpoint 3.2, onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
80      BFCArena::~BFCArena() {
#0  onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
#1  0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346
#2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:317
#3  0x00007ffff4a5f48e in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff42d8b380, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:1071
$97 = {_vptr.IAllocator = 0x7ffee61332e8 <vtable for onnxruntime::CUDAPinnedAllocator+16>, memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', 
      static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}}

Thread 1 "maa" hit Breakpoint 3.1, onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
80      BFCArena::~BFCArena() {
#0  onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80
#1  0x00007ffff51b3fee in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:92
#2  0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346
#3  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:317
$98 = {_vptr.IAllocator = 0x7ffee61332e8 <vtable for onnxruntime::CUDAPinnedAllocator+16>, memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', 
      static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}}

Thread 1 "maa" hit Breakpoint 6, 0x00007fffee456a84 in cudaFreeHost () from /opt/cuda/lib64/libcudart.so.12

(gdb) bt 3

#0  0x00007fffee456a84 in cudaFreeHost () at /opt/cuda/lib64/libcudart.so.12
#1  0x00007ffeb34b3ae1 in onnxruntime::CUDAPinnedAllocator::Free (this=<optimized out>, p=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:98
#2  0x00007ffff51b3e3d in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:82
(More stack frames follow...)
(gdb) info args

No symbol table info available.
(gdb) info registers 

rax            0x7ffee61332e8      140732758438632
rbx            0x7ffd1e5e0a98      140725112933016
rcx            0x0                 0
rdx            0x100000001         4294967297
rsi            0x7ffd19200000      140725024980992
rdi            0x7ffd19200000      140725024980992
rbp            0x7fffffffd990      0x7fffffffd990
rsp            0x7fffffffd990      0x7fffffffd990
r8             0x7fff42d8b         34358963595
r9             0x7                 7
r10            0x7fff42d8b2d0      140734314885840
r11            0x8e177c1ee1a1c7b3  -8207955323783100493
r12            0x7fff42f652f0      140734316827376
r13            0x7fff42d9c187      140734314955143
r14            0x1ff               511
r15            0x7ffd1d048770      140725090289520
rip            0x7fffee456a84      0x7fffee456a84 <cudaFreeHost+4>
eflags         0x206               [ PF IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
fs_base        0x7ffff7423980      140737341700480
gs_base        0x0                 0
(gdb) continue 

Continuing.

Thread 1 "maa" hit Catchpoint 4 (exception thrown), 0x00007ffff18b03b1 in __cxxabiv1::__cxa_throw (obj=0x555556457650, tinfo=0x7ffee6131eb8 <typeinfo for onnxruntime::OnnxRuntimeException>, 
    dest=0x7ffeb34a0cb0 <onnxruntime::OnnxRuntimeException::~OnnxRuntimeException()>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:81
81        PROBE2 (throw, obj, tinfo);

(gdb) bt

#0  0x00007ffff18b03b1 in __cxxabiv1::__cxa_throw (obj=0x555556457650, tinfo=0x7ffee6131eb8 <typeinfo for onnxruntime::OnnxRuntimeException>, dest=0x7ffeb34a0cb0 <onnxruntime::OnnxRuntimeException::~OnnxRuntimeException()>)
    at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:81
#1  0x00007ffeb34b60f4 in onnxruntime::CudaCall<cudaError, true>
    (retCode=<optimized out>, exprString=exprString@entry=0x7ffeb403524f "cudaFreeHost(p)", libName=libName@entry=0x7ffeb4035141 "CUDA", successCode=successCode@entry=cudaSuccess, msg=msg@entry=0x7ffeb40350fd "", file=file@entry=0x7ffeb403a730 "/usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc", line=98) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/new_allocator.h:90
#2  0x00007ffeb34b3b0d in onnxruntime::CUDAPinnedAllocator::Free (this=<optimized out>, p=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:98
#3  0x00007ffff51b3e3d in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:82
#4  0x00007ffff51b3fee in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:92
#5  0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346
#6  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:317
#7  0x00007ffff4a5f48e in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff42d8b380, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:1071
#8  std::__shared_ptr<onnxruntime::IAllocator, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fff42d8b378, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:1524
#9  std::shared_ptr<onnxruntime::IAllocator>::~shared_ptr (this=0x7fff42d8b378, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr.h:175
#10 std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> >::~pair (this=0x7fff42d8b370, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_pair.h:185
#11 std::__new_allocator<std::_Rb_tree_node<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::destroy<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > (__p=0x7fff42d8b370, this=<optimized out>)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/new_allocator.h:181
#12 std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > > >::destroy<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > (__p=0x7fff42d8b370, __a=<optimized out>)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/alloc_traits.h:535
#13 std::_Rb_tree<OrtDevice, std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> >, std::_Select1st<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > >, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::_M_destroy_node (__p=0x7fff42d8b350, this=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:625
#14 std::_Rb_tree<OrtDevice, std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> >, std::_Select1st<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > >, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::_M_drop_node (this=<optimized out>, __p=0x7fff42d8b350) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:633
#15 std::_Rb_tree<OrtDevice, std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> >, std::_Select1st<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > >, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::_M_erase (__x=0x7fff42d8b350, this=0x7fff42d8b250) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:1939
#16 0x00007ffff4a7ea5a in std::_Rb_tree<OrtDevice, std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> >, std::_Select1st<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > >, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::~_Rb_tree (this=0x7fff42d8b250, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:736
#17 std::map<OrtDevice, std::shared_ptr<onnxruntime::IAllocator>, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >::~map (this=0x7fff42d8b250, __in_chrg=<optimized out>)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_map.h:312
#18 std::default_delete<std::map<OrtDevice, std::shared_ptr<onnxruntime::IAllocator>, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > > >::operator() (this=<optimized out>, __ptr=0x7fff42d8b250)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:95
#19 std::default_delete<std::map<OrtDevice, std::shared_ptr<onnxruntime::IAllocator>, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > > >::operator() (__ptr=0x7fff42d8b250, this=<optimized out>)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:89
#20 std::unique_ptr<std::map<OrtDevice, std::shared_ptr<onnxruntime::IAllocator>, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > >, std::default_delete<std::map<OrtDevice, std::shared_ptr<onnxruntime::IAllocator>, std::less<OrtDevice>, std::allocator<std::pair<OrtDevice const, std::shared_ptr<onnxruntime::IAllocator> > > > > >::~unique_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:396
#21 onnxruntime::SessionState::~SessionState (this=0x7fff42f64610, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/session_state.h:109
#22 0x00007ffff4a81a91 in std::default_delete<onnxruntime::SessionState>::operator() (this=<optimized out>, __ptr=0x7fff42f64610) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:89
#23 std::default_delete<onnxruntime::SessionState>::operator() (__ptr=0x7fff42f64610, this=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:89
#24 std::unique_ptr<onnxruntime::SessionState, std::default_delete<onnxruntime::SessionState> >::~unique_ptr (this=0x7fff42da9138, __in_chrg=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:396
#25 onnxruntime::InferenceSession::~InferenceSession (this=0x7fff42da8ae0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/inference_session.cc:530
#26 0x00007ffff4a81dee in onnxruntime::InferenceSession::~InferenceSession (this=0x7fff42da8ae0, __in_chrg=<optimized out>) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/inference_session.cc:530
#27 0x00007ffff5d3c568 in Ort::detail::OrtRelease (ptr=<optimized out>) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:124
#28 Ort::detail::Base<OrtSession>::~Base (this=0x7fff42a1c6c8, __in_chrg=<optimized out>) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:561
#29 Ort::detail::ConstSessionImpl<OrtSession>::~ConstSessionImpl (this=0x7fff42a1c6c8, __in_chrg=<optimized out>) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:994
#30 Ort::detail::SessionImpl<OrtSession>::~SessionImpl (this=0x7fff42a1c6c8, __in_chrg=<optimized out>) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:1038
#31 Ort::Session::~Session (this=0x7fff42a1c6c8, __in_chrg=<optimized out>) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:1109
#32 fastdeploy::OrtBackend::~OrtBackend (this=0x7fff42a1c6b0, __in_chrg=<optimized out>) at /usr/src/debug/maa-assistant-arknights/FastDeploy-d0b018ac6c3daa22c7b55b555dc927a5c734d430/fastdeploy/backends/ort/ort_backend.h:57
#33 0x00007ffff5d3c5fe in fastdeploy::OrtBackend::~OrtBackend (this=0x7fff42a1c6b0, __in_chrg=<optimized out>) at /usr/src/debug/maa-assistant-arknights/FastDeploy-d0b018ac6c3daa22c7b55b555dc927a5c734d430/fastdeploy/backends/ort/ort_backend.h:57
#34 0x00007ffff6b97526 in std::default_delete<fastdeploy::BaseBackend>::operator() (this=0x7fff42a1ec78, __ptr=0x7fff42a1c6b0) at /usr/include/c++/13.2.1/bits/unique_ptr.h:99
#35 0x00007ffff6b95542 in std::unique_ptr<fastdeploy::BaseBackend, std::default_delete<fastdeploy::BaseBackend> >::~unique_ptr (this=0x7fff42a1ec78, __in_chrg=<optimized out>) at /usr/include/c++/13.2.1/bits/unique_ptr.h:404
#36 0x00007ffff6b979aa in fastdeploy::Runtime::~Runtime (this=0x7fff42a1e950, __in_chrg=<optimized out>) at /home/arch/projects/MaaAssistantArknights/usr/include/fastdeploy/runtime.h:458
#37 0x00007ffff5d2e157 in std::_Sp_counted_ptr<fastdeploy::Runtime*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=<optimized out>) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:428
#38 0x00007ffff6ae90b1 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42925d80) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:346
#39 0x00007ffff6af0897 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff42da8a40, __in_chrg=<optimized out>) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:1071
#40 0x00007ffff6b94460 in std::__shared_ptr<fastdeploy::Runtime, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fff42da8a38, __in_chrg=<optimized out>) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:1524
#41 0x00007ffff6b9447c in std::shared_ptr<fastdeploy::Runtime>::~shared_ptr (this=0x7fff42da8a38, __in_chrg=<optimized out>) at /usr/include/c++/13.2.1/bits/shared_ptr.h:175
--Type <RET> for more, q to quit, c to continue without paging--

#42 0x00007ffff6b944c2 in fastdeploy::FastDeployModel::~FastDeployModel (this=0x7fff42da8640, __in_chrg=<optimized out>) at /home/arch/projects/MaaAssistantArknights/usr/include/fastdeploy/fastdeploy_model.h:21
#43 0x00007ffff6b97eee in fastdeploy::vision::ocr::Recognizer::~Recognizer (this=0x7fff42da8640, __in_chrg=<optimized out>) at /home/arch/projects/MaaAssistantArknights/usr/include/fastdeploy/vision/ocr/ppocr/recognizer.h:31
#44 0x00007ffff6b97f14 in std::default_delete<fastdeploy::vision::ocr::Recognizer>::operator() (this=0x7ffff7229398 <asst::SingletonHolder<asst::WordOcr>::get_instance()::unique_instance+24>, __ptr=0x7fff42da8640) at /usr/include/c++/13.2.1/bits/unique_ptr.h:99
#45 0x00007ffff6b95fcc in std::unique_ptr<fastdeploy::vision::ocr::Recognizer, std::default_delete<fastdeploy::vision::ocr::Recognizer> >::~unique_ptr
    (this=0x7ffff7229398 <asst::SingletonHolder<asst::WordOcr>::get_instance()::unique_instance+24>, __in_chrg=<optimized out>) at /usr/include/c++/13.2.1/bits/unique_ptr.h:404
#46 0x00007ffff6b9183a in asst::OcrPack::~OcrPack (this=0x7ffff7229388 <asst::SingletonHolder<asst::WordOcr>::get_instance()::unique_instance+8>, __in_chrg=<optimized out>) at /home/arch/projects/MaaAssistantArknights/src/MaaCore/Config/Miscellaneous/OcrPack.cpp:27
#47 0x00007ffff6af42af in asst::WordOcr::~WordOcr (this=0x7ffff7229380 <asst::SingletonHolder<asst::WordOcr>::get_instance()::unique_instance>, __in_chrg=<optimized out>) at /home/arch/projects/MaaAssistantArknights/src/MaaCore/Config/Miscellaneous/OcrPack.h:63
#48 0x00007ffff7570b36 in __run_exit_handlers (status=1, listp=0x7ffff770a680 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#49 0x00007ffff7570c80 in __GI_exit (status=<optimized out>) at exit.c:138
#50 0x00007ffff7557cd7 in __libc_start_call_main (main=main@entry=0x5555556e2e30 <main>, argc=argc@entry=3, argv=argv@entry=0x7fffffffe218) at ../sysdeps/nptl/libc_start_call_main.h:74
#51 0x00007ffff7557d8a in __libc_start_main_impl (main=0x5555556e2e30 <main>, argc=3, argv=0x7fffffffe218, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe208) at ../csu/libc-start.c:360
#52 0x00005555555d0205 in _start ()
(gdb) 

Note that 0x7ffd19200000 allocated by cudaMallocHost failed to be freed in cudaFreeHost when the program terminates and an exception were thrown. I have no idea how this could happen :thinking:

horror-proton avatar Feb 21 '24 04:02 horror-proton

Hello, I'm a member of MaaAssistantArknights, and it occurs on our program as the same.

Onnxruntime version: 1.15.1 with prebuild https://github.com/microsoft/onnxruntime/releases/download/v1.15.1/onnxruntime-linux-x64-gpu-1.15.1.tgz

Exception:

terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
  what():  /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 4: driver shutting down ; GPU=2000772548 ; hostname=Cryolitia-nixos ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=99 ; expr=cudaFreeHost(p); 

core dump:

                #0  0x00007f31a856fd7c __pthread_kill_implementation (libc.so.6 + 0x8cd7c)
                #1  0x00007f31a85209c6 raise (libc.so.6 + 0x3d9c6)
                #2  0x00007f31a85098fa abort (libc.so.6 + 0x268fa)
                #3  0x00007f31a56a9a89 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold (libstdc++.so.6 + 0xa9a89)
                #4  0x00007f31a56b4f8a _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6 + 0xb4f8a)
                #5  0x00007f31a56b3ff9 __cxa_call_terminate (libstdc++.so.6 + 0xb3ff9)
                #6  0x00007f31a56b4716 __gxx_personality_v0 (libstdc++.so.6 + 0xb4716)
                #7  0x00007f31a87c2864 _Unwind_RaiseException_Phase2 (libgcc_s.so.1 + 0x17864)
                #8  0x00007f31a87c32bd _Unwind_Resume (libgcc_s.so.1 + 0x182bd)
                #9  0x00007f31134e1364 _ZN11onnxruntime8CudaCallI9cudaErrorLb1EEENSt11conditionalIXT0_EvNS_6common6StatusEE4typeET_PKcS9_S7_S9_S9_i (libonnxruntime_providers_cuda.so + 0xe1364)
                #10 0x00007f31134dd91b _ZN11onnxruntime19CUDAPinnedAllocator4FreeEPv (libonnxruntime_providers_cuda.so + 0xdd91b)
                #11 0x00007f31a7172d7d n/a (libonnxruntime.so.1.15.1 + 0x972d7d)
                #12 0x00007f31a7172f3d n/a (libonnxruntime.so.1.15.1 + 0x972f3d)
                #13 0x00007f31134eebe2 _ZN11onnxruntime21CUDAExecutionProviderD1Ev (libonnxruntime_providers_cuda.so + 0xeebe2)
                #14 0x00007f31134eed1d _ZN11onnxruntime21CUDAExecutionProviderD0Ev (libonnxruntime_providers_cuda.so + 0xeed1d)
                #15 0x00007f31a6a72b8a n/a (libonnxruntime.so.1.15.1 + 0x272b8a)
                #16 0x00007f31a6a72d7d n/a (libonnxruntime.so.1.15.1 + 0x272d7d)
                #17 0x00007f31a7b31ddd _ZN10fastdeploy10OrtBackendD1Ev (libMaaDerpLearning.so + 0x131ddd)
                #18 0x00007f31a7b31e69 _ZN10fastdeploy10OrtBackendD0Ev (libMaaDerpLearning.so + 0x131e69)
                #19 0x00007f31a7b27105 _ZN10fastdeploy7RuntimeD2Ev (libMaaDerpLearning.so + 0x127105)
                #20 0x00007f31a7b273d2 _ZNSt15_Sp_counted_ptrIPN10fastdeploy7RuntimeELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libMaaDerpLearning.so + 0x1273d2)
                #21 0x00007f31a8188859 _ZN10fastdeploy15FastDeployModelD1Ev (libMaaCore.so + 0x188859)

For more technical details:

  1. we use fastdeploy_ppocr in https://github.com/MaaAssistantArknights/MaaAssistantArknights/blob/0ae92d0de5f83a231d906f8e18ad99764ebab67e/src/MaaCore/Config/Miscellaneous/OcrPack.cpp#L124 , create two instances of fastdeploy::Runtime.
  2. Each fastdeploy::Runtime creates a Ort::Session in https://github.com/MaaAssistantArknights/FastDeploy/blob/master/fastdeploy/backends/ort/ort_backend.cc
  3. When the program exits 0 normally, occurs driver shutting down

Could it be caused by that, each Ort::Session instance owns a instance of cuda driver but the cuda driver was shut down globally when the first instance destructed, and the second instance tries to shut down a already-shut-down cuda driver.

I also encountered a similar problem, ORT should have a global variable inside, which was released early, resulting in the corresponding data can not be found when cudaFreeHost.

hua-hua-lin avatar Apr 01 '24 10:04 hua-hua-lin

i meet the same error and solve it now it occurs when another gpu-task occupies the GPU and gpu memery is not enough (need 1100MB while only 800MB remains)

Kenneth-X avatar Apr 07 '24 09:04 Kenneth-X