TensorRT Problems using pytorch stream

Hello, when I was working with TensorRT 8.6 I had made an engine for inference in python.

example inputs and engine: ` im_patches = torch.randn(batch, 3, 288, 288) train_feat = torch.randn(batch,256,18,18) target_labels = torch.randn(1, batch, 18, 18) train_ltrb = torch.randn(batch, 4, 18, 18) input_shapes2 = [im_patches,train_feat,target_labels,train_ltrb]

def load_engine3(path): with open(path, 'rb') as f,
trt.Runtime(trt.Logger(trt.Logger.WARNING)) as trt_runtime,
trt_runtime.deserialize_cuda_engine(f.read()) as engine,
engine.create_execution_context() as context: return engine, context `

` def run_infer2(self, inputs): output1 = torch.empty((1,1,18,18), dtype=torch.float32).to("cuda") output2 = torch.empty((1,1,4,18,18), dtype=torch.float32).to("cuda")

    bindings=[inputs[0].data_ptr(),
                inputs[1].data_ptr(),
                inputs[2].data_ptr(),
                inputs[3].data_ptr(),
                output1.data_ptr(),
                output2.data_ptr()]
    
    
    print("run_infer2")

    ript = time.time()
    stream = torch.cuda.Stream("cuda")
    context.execute_async_v2(bindings=bindings, stream_handle=stream.cuda_stream)
    rint = time.time()
    
    stream.synchronize()
    
    return output1, output2

`

This was to not have to create a separate stream with pycuda or something else

Now in TensorRT 10 execute_async_v2 is no more, so I have updated my code:

` def run_infer2(context, inputs): output1 = torch.empty((1,1,18,18), dtype=torch.float32).to("cuda") output2 = torch.empty((1,1,4,18,18), dtype=torch.float32).to("cuda")

context.set_tensor_address("im_patches", inputs[0].data_ptr())
context.set_tensor_address("train_feat", inputs[1].data_ptr())
context.set_tensor_address("target_labels", inputs[2].data_ptr())
context.set_tensor_address("train_ltrb", inputs[3].data_ptr())
context.set_tensor_address("scores_raw", output1.data_ptr())
context.set_tensor_address("bbox_preds", output2.data_ptr())
        

ript = time.time()
stream = torch.cuda.Stream("cuda")
current_stream = torch.cuda.current_stream()
torch.cuda.synchronize()
context.execute_async_v3(stream_handle=current_stream.cuda_stream)
torch.cuda.synchronize()
rint = time.time()
print("context.execute_v3 time: ", rint-ript)
        
stream.synchronize()
        
return output1, output2

`

But now I get this error:

[01/26/2025-17:29:16] [TRT] [W] Using default stream in enqueueV3() may lead to performance issues due to additional calls to cudaStreamSynchronize() by TensorRT to ensure correct synchronization. Please use non-default stream instead. [01/26/2025-17:29:16] [TRT] [E] IExecutionContext::enqueueV3: Error Code 1: Cuda Runtime (an illegal memory access was encountered) Traceback (most recent call last): File "D:\Tomo\tracking_tomp\pytracking\infer2.py", line 185, in <module> outt = run_infer2(context, input_shapes2) File "D:\Tomo\tracking_tomp\pytracking\infer2.py", line 165, in run_infer2 torch.cuda.synchronize() File "C:\Users\Tomas\AppData\Roaming\Python\Python310\site-packages\torch\cuda\__init__.py", line 954, in synchronize return torch._C._cuda_synchronize() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.

[01/26/2025-17:29:18] [TRT] [E] [graphContext.h::nvinfer1::rt::MyelinGraphContext::~MyelinGraphContext::84] Error Code 1: Myelin ([::0] Error 201 destroying event '000001CC98F40A70'.) `

Thank you

TensorRT version: 10.7 Windows 10 Nvidia drivers: 561.19 Python 3.10 CUDA: 12.4 Pytorch: 2.5.1 CUDA 12.4

Jan 26 '25 14:01 ninono12345

+1

Apr 22 '25 10:04 CallmeZhangChenchen

+1

Jun 05 '25 03:06 Shiroha-Key

TensorRT TensorRT copied to clipboard

Problems using pytorch stream

TensorRT
TensorRT copied to clipboard