esp-dl Inputs format misunderstood. (prepare

Checklist

[X] Checked the issue tracker for similar issues to ensure this is not a duplicate.
[X] Provided a clear description of your suggestion.
[X] Included any relevant context or examples.

Issue or Suggestion Description

I am getting an error when I am quantizing an (working, at least I can infer successfully in PaddleDetection) ONNX model. It is a PaddleDetection model that has been exported to ONNX using paddle2onnx: PicoDet: backbone: LCNet neck: LCPAN head: PicoHeadV2

..config, basically pedestrian_detect-model (if I understood that lineage properly), that is trained towards other stuff. But when I am running variant of quantize_torch_model.py, that basically only loads other images, the rest is the same.

     ___________ ____        ____  ____  ____
     / ____/ ___// __ \      / __ \/ __ \/ __ \
   / __/  \__ \/ /_/ /_____/ /_/ / /_/ / / / / 
 / /___ ___/ / ____/_____/ ____/ ____/ /_/ /
/_____//____/_/         /_/   /_/    \___\_\


load imagenet calibration dataset from directory: C:/somepath/PaddleDetection/dataset/sea/images/val
[00:07:20] PPQ Layerwise Equalization Pass Running ... 2 equalization pair(s) was found, ready to run optimization.
Layerwise Equalization:   0%|                                                                                                                                                                                                                      | 0/4 [00:00<?, ?it/s][Conv.24(Type: Conv, Num of Input: 3, Num of Output: 1)]
[Conv.28(Type: Conv, Num of Input: 3, Num of Output: 1)]
[Conv.24(Type: Conv, Num of Input: 3, Num of Output: 1)]
[Conv.28(Type: Conv, Num of Input: 3, Num of Output: 1)]
[Conv.24(Type: Conv, Num of Input: 3, Num of Output: 1)]
[Conv.28(Type: Conv, Num of Input: 3, Num of Output: 1)]
[Conv.24(Type: Conv, Num of Input: 3, Num of Output: 1)]
[Conv.28(Type: Conv, Num of Input: 3, Num of Output: 1)]
Layerwise Equalization: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 53.86it/s] 
Finished.
Traceback (most recent call last):
 File "C:\somepath\esp-dl\tools\quantization\quantize_custom_onnx_model.py", line 144, in <module>
   quant_ppq_graph = espdl_quantize_onnx(
                     ^^^^^^^^^^^^^^^^^^^^
 File "somepath\esp-dl\.venv\Lib\site-packages\ppq\core\defs.py", line 54, in _wrapper
   return func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
 File "somepath\esp-dl\.venv\Lib\site-packages\ppq\api\espdl_interface.py", line 223, in espdl_quantize_onnx
   ppq_graph = quantize_onnx_model(
               ^^^^^^^^^^^^^^^^^^^^
 File "somepath\esp-dl\.venv\Lib\site-packages\ppq\core\defs.py", line 54, in _wrapper
   return func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
 File "somepath\esp-dl\.venv\Lib\site-packages\ppq\api\interface.py", line 263, in quantize_onnx_model
   quantizer.quantize(
 File "somepath\esp-dl\.venv\Lib\site-packages\ppq\core\defs.py", line 54, in _wrapper
   return func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
 File "somepath\esp-dl\.venv\Lib\site-packages\ppq\quantization\quantizer\base.py", line 52, in quantize
   executor.tracing_operation_meta(inputs=inputs)
 File "somepath\esp-dl\.venv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
   return func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
 File "somepath\esp-dl\.venv\Lib\site-packages\ppq\core\defs.py", line 54, in _wrapper
   return func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
 File "somepath\esp-dl\.venv\Lib\site-packages\ppq\executor\torch.py", line 616, in tracing_operation_meta
   self.__forward(
 File "somepath\esp-dl\.venv\Lib\site-packages\ppq\executor\torch.py", line 474, in __forward
   inputs = self.prepare_input(inputs=inputs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "somepath\esp-dl\.venv\Lib\site-packages\ppq\executor\base.py", line 140, in prepare_input
   assert len(inputs_dictionary) == len(inputs), \
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Inputs format misunderstood. Given inputs has 1 elements, while graph needs 2

inputs:
[tensor([[[[-2.0644,  0.4211,  1.5156,  ...,  0.7785,  1.0445, -0.3174],
         [ 1.0306,  1.6350, -0.0849,  ..., -0.2133, -0.3099, -0.8846],
         [ 1.4076,  1.2802, -0.5059,  ..., -1.2710,  0.1182, -0.9397],
         ...,
         [-0.7491, -1.5121,  1.2965,  ...,  0.5195,  0.7199, -1.5996],
         [-1.9851, -1.9259,  0.6562,  ..., -2.1439,  1.0245,  0.7217],
         [-1.8503, -2.2407,  0.0850,  ...,  0.4948,  2.0727, -2.2183]],

        [[ 1.4071, -1.5879, -0.7559,  ..., -0.9110,  0.4718, -0.3414],
         [ 0.3396, -0.7012,  0.7558,  ..., -0.4906, -0.4403, -0.2816],
         [-1.4492,  1.1551,  0.9532,  ..., -0.9253, -2.3705,  0.6899],
         ...,
         [ 1.6252,  0.1358,  0.2999,  ...,  0.3089,  0.7925,  1.4319],
         [-0.2310, -0.5723, -0.6862,  ..., -1.0324,  0.8473, -0.4547],
         [-0.1850,  0.2352,  0.4126,  ...,  0.7649, -1.9019, -0.2216]],

        [[ 0.2727,  1.5507,  0.2306,  ..., -0.2771, -0.8666, -0.0491],
         [-0.8079,  1.2834, -0.1030,  ...,  0.2184, -0.5272,  0.8095],
         [ 0.2811,  0.6141,  0.6307,  ..., -0.0618, -0.3692,  0.3364],
         ...,
         [-0.5358, -2.4136,  0.8725,  ..., -0.8501, -1.2791, -0.3178],
         [ 0.4313, -1.3117,  0.6791,  ...,  0.1524,  0.6022,  1.5905],
         [ 0.2910,  0.2658, -0.4634,  ..., -0.4612, -0.0362,  0.8774]]]])]
inputs_dictionary:
{'image': image, 'scale_factor': scale_factor}

I printed the output of inputs and inputs_dictionary, because basically this seems like it is just the completely wrong data in the wrong place or something.

Jan 12 '25 23:01 nicklasb

@nicklasb Can you share the onnx model file?

Jan 13 '25 06:01 100312dog

@nicklasb Judging from your log output, it seems that the number of input shapes you passed in does not match the number of graph inputs in the ONNX model, which is causing the issue. You can visualize the graph inputs of the ONNX model through Netron. If the model has multiple inputs, when calling the espdl_quantize_onnx interface, input_shape should be a list containing multiple lists, with each list corresponding to the shape of one input. It is important to note that for the shape of the feature map, the batch dimension should be set to 1.

Jan 13 '25 06:01 BlueSkyB

So I think that I have fixes the input shape, however, now I encounter an issue, ReduceMin is not implemented in ESP-DL. Is this something that might be implemented soon?

Jan 16 '25 21:01 nicklasb

Or maybe there is an easier way, how did ESP-DL implement the pedestrian_detect model? Basically all I want to do is to customize that.

Jan 16 '25 22:01 nicklasb

@nicklasb The model we use looks like onnx.zip, without the last sqrt op. When you export the model to onnx, you should remove the unnecessary part, something like nms.

Jan 17 '25 02:01 100312dog

@100312dog Ok. Unsure what you mean, was it some paddle project that generated the model then i suppose (the name looks like it). Not sure what you mean with "the unnecessary part"? And how do I remove it?

Jan 17 '25 16:01 nicklasb

@nicklasb

python tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \
              -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams export.post_process=False \
              --output_dir=output_inference

If you are using the offical paddledetection project, use this command to export the model. Use netron to visualize onnx. If you don't add the export.post_process option, the model seems like

There're two problems in this model, The first problem is the batch size is dynamic, but batch size is expected to be 1 in esp-dl. The second problem is the oprators after the transpose and sqrt，including themself, can be processed outside the model using c code. It is much more efficient and it also avoids some unsupported operators such as NMS(Non-maximum suppression) which is not supported in most inferece frameworks.

After exporting the model using the above cmd, run these command to convert it into onnx.

paddle2onnx --model_dir output_inference/picodet_s_320_coco_lcnet/ \
            --model_filename model.pdmodel  \
            --params_filename model.pdiparams \
            --opset_version 11 \
            --save_file picodet_s_320_coco.onnx

onnxsim picodet_s_320_coco.onnx picodet_s_320_coco_sim.onnx

Then the model seems like:

If you want to use the postprocessing code https://github.com/espressif/esp-dl/blob/master/esp-dl/vision/detect/dl_detect_pico_postprocessor.cpp remove the boxed op in the picture. Use onnx.utils.extract_model api to extract the subgraph, remove the ops. https://github.com/espressif/esp-dl/blob/master/esp-dl/vision/detect/dl_detect_pico_postprocessor.cpp#L68 replace the output names with your model.

Jan 20 '25 02:01 100312dog

@100312dog This is great information, thank you! This should solve it for me!

Jan 20 '25 07:01 nicklasb

@100312dog

So I have done different variations of the above with a picodet_s_320_coco_lcnet.yml variant, on which I basically haven't changed anything but the image sizes (to 640*480), but I seem to end up with a slightly different model. And I have a couple of issues;

espdl_quantize_onnx (one the esp-ppq) doesn't seem to fully quantize to int8, leaving floats in a lot of places.
I end up with a model that's sort of similar to the above but the outputs arent scale but seven transposes (p2o.pd_op.transpose.1.0 to p2o.pd_op.transpose.7.0), and some of them have shapes like 1, 4800, 32, 1218181608 and so on.
I am not sure how to use onnx.utils.extract_model to get a subgraph that exactly removes those last operations, how would it look for the above case?

Mar 12 '25 21:03 nicklasb

@100312dog @BlueSkyB Any thoughts?

Mar 18 '25 23:03 nicklasb

I'll answer the first and third questions.

1. Which specific operators have not been quantized to int8? Currently, not all operators have been defined for quantization. The operators defined by esp-ppq are as follows: "Conv", "ConvTranspose", "Gemm", "Relu", "PRelu", "Clip", "Pad", "Resize", "MaxPool", "AveragePool", "GlobalMaxPool", "GlobalAveragePool", "Softmax", "Mul", "Add", "Max", "Sub", "Div", "Reshape", "LeakyRelu", "Concat", "Sigmoid", "Interp", "ReduceMean", "Transpose", "Slice", "Flatten", "HardSwish", "HardSigmoid", "MatMul", "Attention", "LayerNormalization", "Gelu", "PPQBiasFusedMatMul", "Split", "Gather", "Tanh", "Elu", "Greater", "Less", "Equal", "GreaterOrEqual", "LessOrEqual", "ReverseSequence", "Identity"

And you also need to consider the operators currently supported by esp-dl(https://github.com/espressif/esp-dl/blob/master/operator_support_state.md) to decide on your model. If esp-dl does not yet support certain operators, you may need to implement them yourself first, or modify the model structure.

3. How to call onnx.utils.extract_model? You can refer to the official interface documentation: https://onnx.ai/onnx/api/utils.html and combine it with some blog content from search engines. You can start by trying it with a small ONNX model.

Mar 19 '25 02:03 BlueSkyB

@BlueSkyB Thanks for your replies!

Well, for the outputs, for example, the dType is 1 (float), and they are Transpose. I have used a picodet_s_320, but with a 640 as input (no other changes reallly), which should be basically the same as the pedestrian model.
I can get one subtree, but rather need to remove leaves. I just don't get it for some reason, I will check more closely.

Mar 19 '25 23:03 nicklasb

Inputs format misunderstood. (prepare_input) (AIV-743)

Checklist

Issue or Suggestion Description