Random failures of a TensorRT test in Windows GPU TensorRT CI

Open wschin opened this issue 3 years ago • 0 comments

Describe the issue

A test, TensorrtExecutionProviderTest.MultiThreadsTestWithOneSessionSingleThreadInference, frequently fails in Windows GPU TensorRT CI. From its title, it looks like a multi-threading test, so I feel there might be a bug. Could someone familiar with TensorRT take a look? Below I list two different errors generated by this test.

2022-09-20T05:14:17.1291863Z 1: [ RUN      ] TensorrtExecutionProviderTest.MultiThreadsTestWithOneSessionSingleThreadInference
2022-09-20T05:14:21.9092384Z 1: 2022-09-20 05:14:21.9100710 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-09-20 05:14:21 WARNING] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
2022-09-20T05:14:21.9102071Z 1: 2022-09-20 05:14:21.9112506 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-09-20 05:14:21 WARNING] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
2022-09-20T05:14:21.9391947Z 1: 2022-09-20 05:14:21.9401565 [E:onnxruntime:Default, tensorrt_execution_provider.h:58 onnxruntime::TensorrtLogger::log] [2022-09-20 05:14:21   ERROR] 1: [stdArchiveReader.cpp::nvinfer1::rt::StdArchiveReader::StdArchiveReader::54] Error Code 1: Serialization (Serialization assertion sizeRead == static_cast<uint64_t>(mEnd - mCurrent) failed.Size specified in header does not match archive size)
2022-09-20T05:14:21.9396196Z 1: 2022-09-20 05:14:21.9407796 [E:onnxruntime:Default, tensorrt_execution_provider.h:58 onnxruntime::TensorrtLogger::log] [2022-09-20 05:14:21   ERROR] 4: [runtime.cpp::nvinfer1::Runtime::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
2022-09-20T05:14:21.9398995Z 1: D:\a\_work\1\s\onnxruntime\test\providers\tensorrt\tensorrt_basic_test.cc(161): error: Value of: status.IsOK()
2022-09-20T05:14:21.9399681Z 1:   Actual: false
2022-09-20T05:14:21.9400073Z 1: Expected: true
2022-09-20T05:14:21.9412174Z 1: [  FAILED  ] TensorrtExecutionProviderTest.MultiThreadsTestWithOneSessionSingleThreadInference (4812 ms)

2022-09-19T19:08:39.2238214Z 1: [ RUN      ] TensorrtExecutionProviderTest.MultiThreadsTestWithOneSessionSingleThreadInference
2022-09-19T19:08:44.6950897Z 1: 2022-09-19 19:08:44.6947546 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-09-19 19:08:44 WARNING] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
2022-09-19T19:08:44.6962348Z 1: 2022-09-19 19:08:44.6959960 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-09-19 19:08:44 WARNING] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
2022-09-19T19:08:44.6970013Z 1: 2022-09-19 19:08:44.6968018 [E:onnxruntime:Default, tensorrt_execution_provider.h:58 onnxruntime::TensorrtLogger::log] [2022-09-19 19:08:44   ERROR] 3: Cannot deserialize with an empty memory buffer.
2022-09-19T19:08:44.6972776Z 1: 2022-09-19 19:08:44.6968040 [E:onnxruntime:Default, tensorrt_execution_provider.h:58 onnxruntime::TensorrtLogger::log] [2022-09-19 19:08:44   ERROR] 3: Cannot deserialize with an empty memory buffer.
2022-09-19T19:08:44.7227686Z 1: 2022-09-19 19:08:44.7225590 [E:onnxruntime:Default, tensorrt_execution_provider.h:58 onnxruntime::TensorrtLogger::log] [2022-09-19 19:08:44   ERROR] 4: [runtime.cpp::nvinfer1::Runtime::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
2022-09-19T19:08:44.7229367Z 1: 2022-09-19 19:08:44.7225599 [E:onnxruntime:Default, tensorrt_execution_provider.h:58 onnxruntime::TensorrtLogger::lD:\a\_work\1\s\onnxruntime\test\providers\tensorrt\tensorrt_basic_test.cc(161): error: Value of: status.IsOK()
2022-09-19T19:08:44.7230326Z 1:   Actual: false
2022-09-19T19:08:44.7230731Z 1: Expected: true
2022-09-19T19:08:44.7231412Z 1: og] [2022-09-19 19:08:44   ERROR] 4: [runtime.cpp::nvinfer1::Runtime::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
2022-09-19T19:08:44.7236549Z 1: D:\a\_work\1\s\onnxruntime\test\providers\tensorrt\tensorrt_basic_test.cc(161): error: Value of: status.IsOK()
2022-09-19T19:08:44.7237280Z 1:   Actual: false
2022-09-19T19:08:44.7237699Z 1: Expected: true
2022-09-19T19:08:44.7249044Z 1: [  FAILED  ] TensorrtExecutionProviderTest.MultiThreadsTestWithOneSessionSingleThreadInference (5501 ms)

To reproduce

The reproducibility is not stable but you can find various of error messages in Windows GPU TensorRT CI for this single test.

Urgency

Not urgent. It's just a CI test.

Platform

Windows

OS Version

Microsoft Windows Server 2019

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

Just use master branch should be enough.

ONNX Runtime API

C++

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

I don't know. It's our CI machine.

Sep 20 '22 20:09 wschin