TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)

Open 980202006 opened this issue 3 years ago • 17 comments

Description

I found out that the required weight count is twice as in the onnx model, but it's not clear how to fix this error

[08/17/2022-17:06:41] [TRT] [E] Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[08/17/2022-17:06:42] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)
[08/17/2022-17:06:42] [TRT] [V] Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512
[08/17/2022-17:06:42] [TRT] [V] Convolution output dimensions: ()
ERROR: Failed to parse the ONNX file: end2end.onnx
ERROR: Failed to parse the ONNX file.
got 1 errors: 
In node 1840 (parseGraph): INVALID_NODE: Invalid Node - Conv_1840
Conv_1840:kernel weights has count 1474560 but 737280 was expected
Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)

Environment

TensorRT Version: NVIDIA GPU: cu113 CUDA Version: 8.2 Operating System: ubuntu20.04 Python Version (if applicable): 3.7.13 PyTorch Version (if applicable): 1.12+cu113

Relevant Files

https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing

Steps To Reproduce

980202006 avatar Aug 17 '22 09:08 980202006

Maybe a bug, what does your input dimension looks like? image

zerollzeng avatar Aug 17 '22 10:08 zerollzeng

error node: image

zerollzeng avatar Aug 17 '22 10:08 zerollzeng

Where can I get this visualizer?Input likes [1,6,3,720,1296]

980202006 avatar Aug 17 '22 10:08 980202006

https://netron.app/

zerollzeng avatar Aug 17 '22 12:08 zerollzeng

@zerollzeng I observed that there is a parameter max_workspace_size, which may be the largest batch size when exporting the model. What determines max_workspace_size? Will fp16 cause max_workspace_size to become smaller?

980202006 avatar Aug 18 '22 06:08 980202006

@zerollzeng Is there a way to map the problematic operator in onnx to the torch model code?

980202006 avatar Aug 18 '22 06:08 980202006

@zerollzeng Is there a way to map the problematic operator in onnx to the torch model code?

I tried to find the answer before but failed finally :-( so I don't think its possible, and the exported node name will change across different Pytorch versions AFAIK.

zerollzeng avatar Aug 18 '22 12:08 zerollzeng

@zerollzeng I observed that there is a parameter max_workspace_size, which may be the largest batch size when exporting the model. What determines max_workspace_size? Will fp16 cause max_workspace_size to become smaller?

Yes, but since 8.4 you don't need to worry about the workspace size. we set it to max by default.

zerollzeng avatar Aug 18 '22 12:08 zerollzeng

Thank you, I encountered another problem here, do you have any ideas on this problem?

Process Process-3:
Traceback (most recent call last):
  File "/root/miniconda3/envs/mmdeploy/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/mmdeploy/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/home/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 88, in onnx2tensorrt
    device_id=device_id)
  File "/home/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 113, in from_onnx
    raise RuntimeError(f'Failed to parse onnx, {error_msgs}')
RuntimeError: Failed to parse onnx, In node 4622 (addScatterLayer): UNSUPPORTED_NODE: Assertion failed: indicesDims.d[i] <= dataDims.d[i] && "Indices dimensions must be less than data dimensions!"

980202006 avatar Aug 19 '22 02:08 980202006

@zerollzeng

980202006 avatar Aug 19 '22 02:08 980202006

the error is raise in here: https://github.com/onnx/onnx-tensorrt/blob/1da7332349d5b1196ccfa6dc719b839876f1e83e/onnx2trt_utils.cpp#L2265 it's happened during parse the onnx, you can check the node 4622 in you onnx model. or share it here so that I can take a look

zerollzeng avatar Aug 20 '22 11:08 zerollzeng

https://drive.google.com/file/d/1XJ86EWnUmdHEOMgYCsQHs9ESJmdlgbIW/view?usp=sharing

[08/22/2022-10:44:21] [TRT] [V] Graph construction and optimization completed in 28.9074 seconds.
[08/22/2022-10:44:22] [TRT] [V] Using cublasLt as a tactic source
[08/22/2022-10:44:22] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +485, GPU +206, now: CPU 1411, GPU 514 (MiB)
[08/22/2022-10:44:22] [TRT] [V] Using cuDNN as a tactic source
[08/22/2022-10:44:23] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +468, GPU +204, now: CPU 1879, GPU 718 (MiB)
[08/22/2022-10:44:23] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.4
[08/22/2022-10:44:23] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/22/2022-10:44:23] [TRT] [V] Constructing optimization profile number 0 [1/1].
[08/22/2022-10:44:23] [TRT] [E] 4: [shapeCompiler.cpp::evaluateShapeChecks::911] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: reshape would change volume. IShuffleLayer Reshape_4296: reshaping failed for tensor: onnx::Reshape_5167)
Traceback (most recent call last):
  File "/root/miniconda3/envs/mmdeploy/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/root/miniconda3/envs/mmdeploy/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/home/mmdeploy/to_fp16.py", line 274, in <module>
    build_engine_onnx(onnx_model_file)
  File "/home/mmdeploy/to_fp16.py", line 198, in build_engine_onnx
    with builder.build_engine(network, config) as engine, open(args.engine_file, "wb") as f:
AttributeError: __enter__
(base) root@ecs-0:/home/mmdeploy# conda activate mmdeploy

980202006 avatar Aug 22 '22 03:08 980202006

@zerollzeng Thanks, here is my onnx file.

980202006 avatar Aug 22 '22 09:08 980202006

Do you use dynamic shape? looks like your model doesn't support dynamic shape or you input dimension is invalid:

[E] 4: [shapeCompiler.cpp::evaluateShapeChecks::911] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: reshape would change volume. IShuffleLayer Reshape_4296: reshaping failed for tensor: onnx::Reshape_5167)

zerollzeng avatar Aug 22 '22 15:08 zerollzeng

I can't reproduce your error on my side because your model contains your own plugin:

[08/22/2022-15:37:14] [I] [TRT] No importer registered for op: grid_sampler. Attempting to import as plugin.
[08/22/2022-15:37:14] [I] [TRT] Searching for plugin: grid_sampler, plugin_version: 1, plugin_namespace:
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:773: While parsing node number 292 [grid_sampler -> "onnx::Concat_561"]:
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:774: --- Begin node ---
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:775: input: "x.19"
input: "grid_flow"
output: "onnx::Concat_561"
name: "grid_sampler_292"
op_type: "grid_sampler"
attribute {
  name: "align_corners"
  i: 1
  type: INT
}
attribute {
  name: "interpolation_mode"
  i: 0
  type: INT
}
attribute {
  name: "padding_mode"
  i: 1
  type: INT
}
domain: "mmdeploy"

[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:776: --- End node ---
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:778: ERROR: parsers/onnx/builtin_op_importers.cpp:4890 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

zerollzeng avatar Aug 22 '22 15:08 zerollzeng

My command using trtexec:

&&&& FAILED TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=end2end_new.onnx --optShapes=input:1x3x720x1296

zerollzeng avatar Aug 22 '22 15:08 zerollzeng

https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing @zerollzeng The custom operator so files required for my model and the python code used for exporting are here.

980202006 avatar Aug 23 '22 03:08 980202006

@980202006 Is the error still exist in latest 8.6? thanks!

ttyio avatar Jul 18 '23 17:07 ttyio

I don't remember, I bypassed this problem by rewriting the torch forward inference code

980202006 avatar Jul 21 '23 07:07 980202006

Okay, I'm closing this now. Feel free to reopen it if you have any further questions.

zerollzeng avatar Jul 21 '23 09:07 zerollzeng

Description

I found out that the required weight count is twice as in the onnx model, but it's not clear how to fix this error

[08/17/2022-17:06:41] [TRT] [E] Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[08/17/2022-17:06:42] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)
[08/17/2022-17:06:42] [TRT] [V] Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512
[08/17/2022-17:06:42] [TRT] [V] Convolution output dimensions: ()
ERROR: Failed to parse the ONNX file: end2end.onnx
ERROR: Failed to parse the ONNX file.
got 1 errors: 
In node 1840 (parseGraph): INVALID_NODE: Invalid Node - Conv_1840
Conv_1840:kernel weights has count 1474560 but 737280 was expected
Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)

Environment

TensorRT Version: NVIDIA GPU: cu113 CUDA Version: 8.2 Operating System: ubuntu20.04 Python Version (if applicable): 3.7.13 PyTorch Version (if applicable): 1.12+cu113

Relevant Files

https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing

Steps To Reproduce

can you tell me how fix this issue

Liupei1101 avatar Mar 18 '24 05:03 Liupei1101

Description

I found out that the required weight count is twice as in the onnx model, but it's not clear how to fix this error

[08/17/2022-17:06:41] [TRT] [E] Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[08/17/2022-17:06:42] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)
[08/17/2022-17:06:42] [TRT] [V] Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512
[08/17/2022-17:06:42] [TRT] [V] Convolution output dimensions: ()
ERROR: Failed to parse the ONNX file: end2end.onnx
ERROR: Failed to parse the ONNX file.
got 1 errors: 
In node 1840 (parseGraph): INVALID_NODE: Invalid Node - Conv_1840
Conv_1840:kernel weights has count 1474560 but 737280 was expected
Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)

Environment

TensorRT Version: NVIDIA GPU: cu113 CUDA Version: 8.2 Operating System: ubuntu20.04 Python Version (if applicable): 3.7.13 PyTorch Version (if applicable): 1.12+cu113

Relevant Files

https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing

Steps To Reproduce

do you have fix the issue?

Liupei1101 avatar Apr 09 '24 03:04 Liupei1101

error node: image [08/17/2022-17:06:41] [TRT] [E] Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 33 * 512 / 2 = 737280 [08/17/2022-17:06:42] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions) [08/17/2022-17:06:42] [TRT] [V] Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512 [08/17/2022-17:06:42] [TRT] [V] Convolution output dimensions: () ERROR: Failed to parse the ONNX file: end2end.onnx ERROR: Failed to parse the ONNX file. got 1 errors: In node 1840 (parseGraph): INVALID_NODE: Invalid Node - Conv_1840 Conv_1840:kernel weights has count 1474560 but 737280 was expected Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 33 * 512 / 2 = 737280 [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions) I have the same problem?I do not know why?

Liupei1101 avatar Apr 09 '24 07:04 Liupei1101

Failed to parse the ONNX file: end2end.onnx ERROR: Failed to parse the ONNX file.

i have the same problem, how to solve?

Liupei1101 avatar Apr 09 '24 07:04 Liupei1101