TensorRT
TensorRT copied to clipboard
if_conditional() is time-consuming.
Hello everyone!
I am using TensorRT 8.2 and the Python API to build a YOLOv5 model with multiple branches.
Specifically, each convolutional layer has multiple branches (but only one branch is executed during each inference), so I am using nested network.add_if_conditional().
Fortunately, I achieved the functionality I wanted, but the exported engine file is quite large (which is not the most important issue). However, the actual inference time increases as the number of branches increases.
This is the code for using nested if_conditional() for the YOLOv5 output heads. Since the number of branches is often more than two, nested if_conditional() is needed.
def get_yolo_head(bottleneck_csp17, bottleneck_csp20, bottleneck_csp23, weight_map, network, task_id, head_num=3):
head_out = []
head_in = [bottleneck_csp17, bottleneck_csp20, bottleneck_csp23]
max = 255
for head in range(head_num):
det0_list = [] # multi-branch outputs
det0_if_layer = [] # multi-branch if-condition layers
for task in range(TOTAL_TASK - 1):
if_conditional_layer = network.add_if_conditional()
# set input
cur_input = if_conditional_layer.add_input(head_in[head]).get_output(0)
# set condition
if_conditional_layer.set_condition(task_id[task + 1])
det0 = network.add_convolution_nd(cur_input,
3 * (CLASS_NUM[task + 1] + 5),
trt.DimsHW(1, 1),
kernel=weight_map[
"model.24." + str(task+1) + ".m." + str(head) + ".weight"],
bias=weight_map["model.24." + str(task+1) + ".m." + str(head) + ".bias"])
# zero padding to make the output shapes consistent
env = reshape_det(network, det0.get_output(0), max)
det0_list.append(env)
det0_if_layer.append(if_conditional_layer)
det0_base = network.add_convolution_nd(cur_input,
3 * (CLASS_NUM[0] + 5),
trt.DimsHW(1, 1),
kernel=weight_map["model.24." + str(0) + ".m." + str(head) + ".weight"],
bias=weight_map["model.24." + str(0) + ".m." + str(head) + ".bias"])
for task in range(TOTAL_TASK - 1):
c_l = det0_if_layer[task]
if task == 0:
det0 = c_l.add_output(det0_list[task], det0_base.get_output(0)).get_output(0)
else:
det0 = c_l.add_output(det0_list[task], det0).get_output(0)
head_out.append(det0)
return head_out[0], head_out[1], head_out[2]
I would like to know if there is a better way to avoid the increase in inference time.
Any possible suggestions would be greatly appreciated!