onnxruntime
onnxruntime copied to clipboard
[TensorRT EP] Fix bug for DDS output handling for empty tensor
When the DDS output is empty tensor (i.e. any of the dimension is 0), TRT EP won't perform either cudaMemcpyAsync() nor cuda::Impl_Cast(), to prevent accidentally overwriting other location that might belong to other tensors.
#TODO: add unit tests to cover the DDS code paths or doing more testing with concurrent,sequential, threaded faster-rcnn using onnx_test_runner and verifying outputs