TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

if_conditional() is time-consuming.

Open Parsifal133 opened this issue 6 months ago • 6 comments

Hello everyone!

I am using TensorRT 8.2 and the Python API to build a YOLOv5 model with multiple branches.

Specifically, each convolutional layer has multiple branches (but only one branch is executed during each inference), so I am using nested network.add_if_conditional().

Fortunately, I achieved the functionality I wanted, but the exported engine file is quite large (which is not the most important issue). However, the actual inference time increases as the number of branches increases.

This is the code for using nested if_conditional() for the YOLOv5 output heads. Since the number of branches is often more than two, nested if_conditional() is needed.

def get_yolo_head(bottleneck_csp17, bottleneck_csp20, bottleneck_csp23, weight_map, network, task_id, head_num=3):

    head_out = []
    head_in = [bottleneck_csp17, bottleneck_csp20, bottleneck_csp23]
    max = 255
    for head in range(head_num):
        det0_list = []  # multi-branch outputs
        det0_if_layer = []  # multi-branch if-condition layers
        for task in range(TOTAL_TASK - 1):
            if_conditional_layer = network.add_if_conditional()
            # set input
            cur_input = if_conditional_layer.add_input(head_in[head]).get_output(0)
            # set condition
            if_conditional_layer.set_condition(task_id[task + 1])

            det0 = network.add_convolution_nd(cur_input,
                                              3 * (CLASS_NUM[task + 1] + 5),
                                              trt.DimsHW(1, 1),
                                              kernel=weight_map[
                                                  "model.24." + str(task+1) + ".m." + str(head) + ".weight"],
                                              bias=weight_map["model.24." + str(task+1) + ".m." + str(head) + ".bias"])
          
            # zero padding to make the output shapes consistent
            env = reshape_det(network, det0.get_output(0), max)

            det0_list.append(env)
            det0_if_layer.append(if_conditional_layer)

        det0_base = network.add_convolution_nd(cur_input,
                                               3 * (CLASS_NUM[0] + 5),
                                               trt.DimsHW(1, 1),
                                               kernel=weight_map["model.24." + str(0) + ".m." + str(head) + ".weight"],
                                               bias=weight_map["model.24." + str(0) + ".m." + str(head) + ".bias"])

        for task in range(TOTAL_TASK - 1):
            c_l = det0_if_layer[task]
            if task == 0:
                det0 = c_l.add_output(det0_list[task], det0_base.get_output(0)).get_output(0)
            else:
                det0 = c_l.add_output(det0_list[task], det0).get_output(0)
        head_out.append(det0)

    return head_out[0], head_out[1], head_out[2]

I would like to know if there is a better way to avoid the increase in inference time.

Any possible suggestions would be greatly appreciated!

Parsifal133 avatar Aug 07 '24 09:08 Parsifal133