TensorRT [runner.cpp::executeMyelinGraph::715] Error Code 1: Myelin ([myelinGraphExecute] Called without resolved dynamic shapes.)

Description

I converted a onnx model into a tensorrt engine, and i used c++ api on T600 GPU, but when i execute bool status=context_ -> ExecuteV2 (buffers. getDeviceBindings(). data()); Report an error :[runner.cpp::executeMyelinGraph::715] Error Code 1: Myelin ([myelinGraphExecute] Called without resolved dynamic shapes.).And my onnx compilation file is complete.

More information is as follows:

[11/15/2023-11:38:41] [I] [TRT] Loaded engine size: 50 MiB
[11/15/2023-11:38:41] [V] [TRT] Deserialization required 23795 microseconds.
[11/15/2023-11:38:41] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +44, now: CPU 0, GPU 88 (MiB)
[11/15/2023-11:38:41] [V] [TRT] Total per-runner device persistent memory is 13920256
[11/15/2023-11:38:41] [V] [TRT] Total per-runner host persistent memory is 69152
[11/15/2023-11:38:41] [V] [TRT] Allocated activation device memory of size 611810304
[11/15/2023-11:38:41] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +596, now: CPU 0, GPU 684 (MiB)
[11/15/2023-11:38:41] [V] [TRT] CUDA lazy loading is enabled.
[11/15/2023-11:38:41] [I] [TRT] [MS] Running engine with multi stream info
[11/15/2023-11:38:41] [I] [TRT] [MS] Number of aux streams is 1
[11/15/2023-11:38:41] [I] [TRT] [MS] Number of total worker streams is 2
[11/15/2023-11:38:41] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[11/15/2023-11:38:41] [V] [TRT] Total per-runner device persistent memory is 1025024
[11/15/2023-11:38:41] [V] [TRT] Total per-runner host persistent memory is 787408
[11/15/2023-11:38:41] [V] [TRT] Allocated activation device memory of size 35127296
[11/15/2023-11:38:41] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +35, now: CPU 1, GPU 719 (MiB)
[11/15/2023-11:38:41] [V] [TRT] CUDA lazy loading is enabled.
[11/15/2023-11:38:41] [I] [TRT] [MS] Running engine with multi stream info
[11/15/2023-11:38:41] [I] [TRT] [MS] Number of aux streams is 1
[11/15/2023-11:38:41] [I] [TRT] [MS] Number of total worker streams is 2
[11/15/2023-11:38:41] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[11/15/2023-11:38:41] [V] [TRT] Total per-runner device persistent memory is 0
[11/15/2023-11:38:41] [V] [TRT] Total per-runner host persistent memory is 9472
[11/15/2023-11:38:41] [V] [TRT] Allocated activation device memory of size 116929536
[11/15/2023-11:38:42] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +111, now: CPU 1, GPU 830 (MiB)
[11/15/2023-11:38:42] [V] [TRT] CUDA lazy loading is enabled.
[11/15/2023-11:38:42] [E] [TRT] 1: [runner.cpp::executeMyelinGraph::715] Error Code 1: Myelin ([myelinGraphExecute] Called without resolved dynamic shapes.)

and i execute setBindingDimensions() for my input tensor,Some example codes are as follows

 const int keypoints_0_index = engine_->getBindingIndex(lightglue_config_.input_tensor_names[0].c_str());
    const int keypoints_1_index = engine_->getBindingIndex(lightglue_config_.input_tensor_names[1].c_str());
    const int descriptors_0_index = engine_->getBindingIndex(lightglue_config_.input_tensor_names[2].c_str());
    const int descriptors_1_index = engine_->getBindingIndex(lightglue_config_.input_tensor_names[3].c_str());
    const int output_match_index = engine_->getBindingIndex(lightglue_config_.output_tensor_names[0].c_str());
    const int output_score_index = engine_->getBindingIndex(lightglue_config_.output_tensor_names[1].c_str());

    context_->setBindingDimensions(keypoints_0_index, Dims3(1, features0.cols(), 2));
    context_->setBindingDimensions(keypoints_1_index, Dims3(1, features1.cols(), 2));
    context_->setBindingDimensions(descriptors_0_index, Dims3(1, features0.cols(), 256));
    context_->setBindingDimensions(descriptors_1_index, Dims3(1, features1.cols(), 256));

    keypoints_0_dims_ = context_->getBindingDimensions(keypoints_0_index);
    keypoints_1_dims_ = context_->getBindingDimensions(keypoints_1_index);
    descriptors_0_dims_ = context_->getBindingDimensions(descriptors_0_index);
    descriptors_1_dims_ = context_->getBindingDimensions(descriptors_1_index);
  std::cout<<" "<<keypoints_0_dims_<<keypoints_1_dims_<<descriptors_0_dims_<<descriptors_1_dims_
<<" "<<output_match_dims_<<" "<<output_score_dims<<std::endl;

The printout result is (1, 605, 2)(1, 613, 2)(1, 605, 256)(1, 613, 256) (0, 2) (0)

then

    if (!process_input(buffers, norm_keypoints0, norm_keypoints1)) {
        return false;
    }
    buffers.copyInputToDevice();
    bool status = context_->executeV2(buffers.getDeviceBindings().data());

    if (!status) {
        std::cout<<" infer failed! "<<output_match_dims_<<std::endl;
        return false;
    }

Report an error here

Environment

TensorRT Version: 8.6.1

NVIDIA GPU: T600

NVIDIA Driver Version: 535.113.01

CUDA Version: 11.8

CUDNN Version: 8.9.4.25

Operating System: ubuntu20.04

Nov 15 '23 04:11 001SCH

The onnx can be found here onnx, thanks

Nov 15 '23 04:11 001SCH

I've requested access.

Could you please try this with trtexec? like trtexec --onnx=model.onnx. If possible, it would be great if you can test latest 9.1 release or on other newer GPUs.