TensorRT8.5.2 conversion failed when converting groups convolutions
Description
It will report an error when converting a model with group convolution using TensorRT8.5.2.
Environment
TensorRT Version: 8.5.2.2
Relevant Files
The error message is as follows:
[08/29/2023-21:40:17] [TRT] [E] Conv_221:kernel weights has count 294912 but 147456 was expected
[08/29/2023-21:40:17] [TRT] [E] Conv_221: count of 294912 weights in kernel, but kernel dimensions (3,3) with 128 input channels, 256 output channels and 2 groups were specified. Expected Weights count is 128 * 3*3 * 256 / 2 = 147456
[08/29/2023-21:40:17] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::57] Error Code 4: Internal Error (Conv_221: number of kernel weights does not match tensor dimensions)
[08/29/2023-21:40:17] [TRT] [E] ModelImporter.cpp:726: While parsing node number 221 [Conv -> "input.487"]:
[08/29/2023-21:40:17] [TRT] [E] ModelImporter.cpp:727: --- Begin node ---
[08/29/2023-21:40:17] [TRT] [E] ModelImporter.cpp:728: input: "input.483"
input: "bbox_head.reg_convs.1.conv.weight"
input: "bbox_head.reg_convs.1.conv.bias"
output: "input.487"
name: "Conv_221"
op_type: "Conv"
attribute {
name: "dilations"
ints: 1
ints: 1
type: INTS
}
attribute {
name: "group"
i: 2
type: INT
}
attribute {
name: "kernel_shape"
ints: 3
ints: 3
type: INTS
}
attribute {
name: "pads"
ints: 1
ints: 1
ints: 1
ints: 1
type: INTS
}
attribute {
name: "strides"
ints: 1
ints: 1
type: INTS
}
[08/29/2023-21:40:17] [TRT] [E] ModelImporter.cpp:729: --- End node ---
[08/29/2023-21:40:17] [TRT] [E] ModelImporter.cpp:731: ERROR: ModelImporter.cpp:172 In function parseGraph:
[6] Invalid Node - Conv_221
Conv_221:kernel weights has count 294912 but 147456 was expected
Conv_221: count of 294912 weights in kernel, but kernel dimensions (3,3) with 128 input channels, 256 output channels and 2 groups were specified. Expected Weights count is 128 * 3*3 * 256 / 2 = 147456
[convolutionNode.cpp::computeOutputExtents::57] Error Code 4: Internal Error (Conv_221: number of kernel weights does not match tensor dimensions)
The error message display was caused by a mismatch between the weight and the actual convolution, but I was able to run normally on Pytorch using the same model and configuration file. The error message displays "but kernel dimensions (3,3) with 128 input channels, 256 output channels, and 2 groups were specified." However, in reality, the number of input channels for this convolution should be 256. The number of input channels I used when building the model is 256, which can be determined. Therefore, I suspect that grouping convolution caused an error in TensorRT calculation, as the model is actually continuous.
In addition, another issue was encountered while converting shuffleNet.
[08/29/2023-17:23:23] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::54] Error Code 4: Internal Error (Conv_136: group count must divide input channel count)
[08/29/2023-17:23:23] [TRT] [E] ModelImporter.cpp:726: While parsing node number 136 [Conv -> "input.131"]:
[08/29/2023-17:23:23] [TRT] [E] ModelImporter.cpp:727: --- Begin node ---
[08/29/2023-17:23:23] [TRT] [E] ModelImporter.cpp:728: input: "input.123"
input: "onnx::Conv_2562"
input: "onnx::Conv_2563"
output: "input.131"
name: "Conv_136"
op_type: "Conv"
attribute {
name: "dilations"
ints: 1
ints: 1
type: INTS
}
attribute {
name: "group"
i: 116
type: INT
}
attribute {
name: "kernel_shape"
ints: 3
ints: 3
type: INTS
}
attribute {
name: "pads"
ints: 1
ints: 1
ints: 1
ints: 1
type: INTS
}
attribute {
name: "strides"
ints: 2
ints: 2
type: INTS
}
[08/29/2023-17:23:23] [TRT] [E] ModelImporter.cpp:729: --- End node ---
[08/29/2023-17:23:23] [TRT] [E] ModelImporter.cpp:731: ERROR: ModelImporter.cpp:172 In function parseGraph:
[6] Invalid Node - Conv_136
[convolutionNode.cpp::computeOutputExtents::54] Error Code 4: Internal Error (Conv_136: group count must divide input channel count)
It is worth mentioning that both of these use the shuffle operation, which is the following code:
def channel_shuffle(x, groups):
Channel Shuffle operation.
This function enables cross-group information flow for multiple groups
convolution layers.
Args:
x (Tensor): The input tensor.
groups (int): The number of groups to divide the input tensor
in the channel dimension.
Returns:
Tensor: The output tensor after channel shuffle operation.
batch_size, num_channels, height, width = x.size()
assert (num_channels % groups == 0), ('num_channels should be '
'divisible by groups')
channels_per_group = num_channels // groups
x = x.view(batch_size, groups, channels_per_group, height, width)
x = torch.transpose(x, 1, 2).contiguous()
x = x.view(batch_size, -1, height, width)
return x
Will this operation also affect the correct transformation of the TensorRT model?
I conducted further experiments and found that using only one group conv can successfully transform the model. If the shuffle operation is removed, using continuous group convs can still successfully transform the model.
Now I can confirm that the conversion failure was caused by the shuffle operation, but this operation is crucial. How can I successfully convert the model while retaining the shuffle operation?
def channel_shuffle(x, groups):
Channel Shuffle operation.
This function enables cross-group information flow for multiple groups
convolution layers.
Args:
x (Tensor): The input tensor.
groups (int): The number of groups to divide the input tensor
in the channel dimension.
Returns:
Tensor: The output tensor after channel shuffle operation.
batch_size, num_channels, height, width = x.size()
assert (num_channels % groups == 0), ('num_channels should be '
'divisible by groups')
channels_per_group = num_channels // groups
x = x.view(batch_size, groups, channels_per_group, height, width)
x = torch.transpose(x, 1, 2).contiguous()
x = x.view(batch_size, -1, height, width)
return x
When using Pytorch for training, the shuffle operation performs very normally, with 256 output and input channels. Why does it cause the output channel number to become 128 when converted to TensorRT, which means the input channel number for the next convolution becomes 128? This Pytorch model can run normally, and there should be no problem building the model.
[6] Invalid Node - Conv_221 Conv_221:kernel weights has count 294912 but 147456 was expected Conv_221: count of 294912 weights in kernel, but kernel dimensions (3,3) with 128 input channels, 256 output channels and 2 groups were specified. Expected Weights count is 128 * 3*3 * 256 / 2 = 147456
Does the model work with onnxruntime? what's the input shape when building the engine?
[6] Invalid Node - Conv_221 Conv_221:kernel weights has count 294912 but 147456 was expected Conv_221: count of 294912 weights in kernel, but kernel dimensions (3,3) with 128 input channels, 256 output channels and 2 groups were specified. Expected Weights count is 128 * 3*3 * 256 / 2 = 147456
Does the model work with onnxruntime? what's the input shape when building the engine?
It can successfully output the onnx model. The number of channels in the input feature map and the number of output channels are both 256
Could you please try latest TRT release? if it still fails please provide a reproduce to us for further investigation. Many thanks!
Closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks!
please open this issue!
I am having same issue - tensorrt version - nvcr.io/nvidia/tensorrt:23.05-py3
@braindotai please try with latest release and share with @zerollzeng the repro if still failed, thanks!
@ttyio appreciate it. I'll update in couple of days. Thanks
@ttyio The issue still occurs on different devices using both v10.0.1.6 and v8.6.1 versions.
Hi,
I fixed the problem by adding .contiguous() to the end of the .view(.) function. I suggest that the reason for the error is that the tensor data is not in a suitable format in memory.
def channel_shuffle(x, groups):
Channel Shuffle operation.
This function enables cross-group information flow for multiple groups
convolution layers.
Args:
x (Tensor): The input tensor.
groups (int): The number of groups to divide the input tensor
in the channel dimension.
Returns:
Tensor: The output tensor after channel shuffle operation.
batch_size, num_channels, height, width = x.size()
assert (num_channels % groups == 0), ('num_channels should be '
'divisible by groups')
channels_per_group = num_channels // groups
x = x.view(batch_size, groups, channels_per_group, height, width)
x = torch.transpose(x, 1, 2).contiguous()
x = x.view(batch_size, -1, height, width)
return x.contiguous()