anomalib icon indicating copy to clipboard operation
anomalib copied to clipboard

🐞 ONNX Export for Reverse Distillation and RKDE fails

Open ashwinvaidya17 opened this issue 1 year ago • 6 comments

Describe the bug

ONNX export fails with error

torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of convolution for kernel of unknown shape.  [Caused by the value '1113 defined in (%1113 : Float(*, *, *, *, strides=[73728, 73728, 288, 1], requires_grad=1, device=cuda:0) = onnx::Reshap

Dataset

N/A

Model

N/A

Steps to reproduce the behavior

Un-skip the ReverseDistillation test in export and run tests/integration/model/test_models.py

OS information

OS information:

  • OS: [e.g. Ubuntu 20.04]
  • Python version: [e.g. 3.8.10]: 3.10
  • Anomalib version: [e.g. 0.3.6]: v1 Branch
  • PyTorch version: [e.g. 1.9.0]
  • CUDA/cuDNN version: [e.g. 11.1]
  • GPU models and configuration: [e.g. 2x GeForce RTX 3090]
  • Any other relevant information: [e.g. I'm using a custom dataset]

Expected behavior

Should export successfully.

Screenshots

No response

Pip/GitHub

GitHub

What version/branch did you use?

v1

Configuration YAML

NA

Logs

NA Needs further investigation

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

ashwinvaidya17 avatar Nov 30 '23 11:11 ashwinvaidya17

This seems to be a problem with kornia filter. I've read issue https://github.com/pytorch/pytorch/issues/85464 and https://github.com/onnx/onnx/issues/4580, it seems that it has something to do with shape propagation. But I also came across https://github.com/pytorch/pytorch/issues/98497 where they say that kernel size should be known exactly, but kornia does this:

kernel: torch.Tensor = get_gaussian_kernel2d(kernel_size, sigma)
out = filter2d(input, kernel[None], border_type)

which might be problematic.

blaz-r avatar Dec 04 '23 11:12 blaz-r

Hi, I would like to resolve this issue. Can you assign it to me?

Ashutosh-Gera avatar Mar 15 '24 21:03 Ashutosh-Gera

Hi @Ashutosh-Gera, thanks for the interest, you can handle this.

blaz-r avatar Mar 15 '24 23:03 blaz-r

Hi everybody, I have the same issue with Reverse Distillation

Thanks

enricobv avatar Mar 28 '24 07:03 enricobv

This seems to be a problem with kornia filter. I've read issue pytorch/pytorch#85464 and onnx/onnx#4580, it seems that it has something to do with shape propagation. But I also came across pytorch/pytorch#98497 where they say that kernel size should be known exactly, but kornia does this:

kernel: torch.Tensor = get_gaussian_kernel2d(kernel_size, sigma)
out = filter2d(input, kernel[None], border_type)

which might be problematic.

Upgrading kornia to 0.7.2 doesn't solve the issue but moves it to another function:

torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of convolution for kernel of unknown shape.  
[Caused by the value '1165 defined in (%1165 : Float(*, *, *, *, strides=[73728, 73728, 288, 1], requires_grad=1, device=cpu) = onnx::Reshape[allowzero=0](%1140, %1164), scope: anomalib.deploy.export.InferenceModel::/anomalib.models.image.reverse_distillation.torch_model.ReverseDistillationModel::model/anomalib.models.image.reverse_distillation.anomaly_map.AnomalyMapGenerator::anomaly_map_generator # /home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/filter.py:126:0
)' (type 'Tensor') in the TorchScript graph. The containing node has kind 'onnx::Reshape'.] 
    (node defined in /home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/filter.py(126): filter2d
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/filter.py(188): filter2d_separable
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/gaussian.py(65): gaussian_blur2d
/home/abogusze/repos/anomalib/src/anomalib/models/image/reverse_distillation/anomaly_map.py(90): forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/home/abogusze/repos/anomalib/src/anomalib/models/image/reverse_distillation/torch_model.py(85): forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/home/abogusze/repos/anomalib/src/anomalib/deploy/export.py(74): forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/jit/_trace.py(124): wrapper
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/jit/_trace.py(133): forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/jit/_trace.py(1285): _get_trace_graph
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/onnx/utils.py(915): _trace_and_get_graph_from_model
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/onnx/utils.py(1011): _create_jit_graph
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/onnx/utils.py(1135): _model_to_graph
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/onnx/utils.py(1596): _export
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/onnx/utils.py(516): export
/home/abogusze/repos/anomalib/src/anomalib/deploy/export.py(217): export_to_onnx
/home/abogusze/repos/anomalib/src/anomalib/deploy/export.py(292): export_to_openvino
/home/abogusze/repos/anomalib/src/anomalib/engine/engine.py(910): export
/home/abogusze/repos/anomalib/tmp_train_export.py(21): <module>
)

    Inputs:
        #0: 1140 defined in (%1140 : Float(*, *, *, *, strides=[73728, 73728, 288, 1], requires_grad=1, device=cpu) = onnx::Pad[mode="reflect"](%1055, %1139), scope: anomalib.deploy.export.InferenceModel::/anomalib.models.image.reverse_distillation.torch_model.ReverseDistillationModel::model/anomalib.models.image.reverse_distillation.anomaly_map.AnomalyMapGenerator::anomaly_map_generator # /home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/filter.py:122:0
    )  (type 'Tensor')
        #1: 1164 defined in (%1164 : int[] = prim::ListConstruct(%615, %1146, %1155, %1163), scope: anomalib.deploy.export.InferenceModel::/anomalib.models.image.reverse_distillation.torch_model.ReverseDistillationModel::model/anomalib.models.image.reverse_distillation.anomaly_map.AnomalyMapGenerator::anomaly_map_generator
    )  (type 'List[int]')
    Outputs:
        #0: 1165 defined in (%1165 : Float(*, *, *, *, strides=[73728, 73728, 288, 1], requires_grad=1, device=cpu) = onnx::Reshape[allowzero=0](%1140, %1164), scope: anomalib.deploy.export.InferenceModel::/anomalib.models.image.reverse_distillation.torch_model.ReverseDistillationModel::model/anomalib.models.image.reverse_distillation.anomaly_map.AnomalyMapGenerator::anomaly_map_generator # /home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/filter.py:126:0
    )  (type 'Tensor')

adrianboguszewski avatar Apr 05 '24 11:04 adrianboguszewski

There used to be an issue in 2022 related to this same blur #476. That's the reason blur.py was created. So I think switching to that inside RevDist might work.

blaz-r avatar Apr 05 '24 12:04 blaz-r