anomalib
anomalib copied to clipboard
🐞 ONNX Export for Reverse Distillation and RKDE fails
Describe the bug
ONNX export fails with error
torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of convolution for kernel of unknown shape. [Caused by the value '1113 defined in (%1113 : Float(*, *, *, *, strides=[73728, 73728, 288, 1], requires_grad=1, device=cuda:0) = onnx::Reshap
Dataset
N/A
Model
N/A
Steps to reproduce the behavior
Un-skip the ReverseDistillation
test in export
and run tests/integration/model/test_models.py
OS information
OS information:
- OS: [e.g. Ubuntu 20.04]
- Python version: [e.g. 3.8.10]: 3.10
- Anomalib version: [e.g. 0.3.6]: v1 Branch
- PyTorch version: [e.g. 1.9.0]
- CUDA/cuDNN version: [e.g. 11.1]
- GPU models and configuration: [e.g. 2x GeForce RTX 3090]
- Any other relevant information: [e.g. I'm using a custom dataset]
Expected behavior
Should export successfully.
Screenshots
No response
Pip/GitHub
GitHub
What version/branch did you use?
v1
Configuration YAML
NA
Logs
NA Needs further investigation
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
This seems to be a problem with kornia filter. I've read issue https://github.com/pytorch/pytorch/issues/85464 and https://github.com/onnx/onnx/issues/4580, it seems that it has something to do with shape propagation. But I also came across https://github.com/pytorch/pytorch/issues/98497 where they say that kernel size should be known exactly, but kornia does this:
kernel: torch.Tensor = get_gaussian_kernel2d(kernel_size, sigma)
out = filter2d(input, kernel[None], border_type)
which might be problematic.
Hi, I would like to resolve this issue. Can you assign it to me?
Hi @Ashutosh-Gera, thanks for the interest, you can handle this.
Hi everybody, I have the same issue with Reverse Distillation
Thanks
This seems to be a problem with kornia filter. I've read issue pytorch/pytorch#85464 and onnx/onnx#4580, it seems that it has something to do with shape propagation. But I also came across pytorch/pytorch#98497 where they say that kernel size should be known exactly, but kornia does this:
kernel: torch.Tensor = get_gaussian_kernel2d(kernel_size, sigma) out = filter2d(input, kernel[None], border_type)
which might be problematic.
Upgrading kornia to 0.7.2 doesn't solve the issue but moves it to another function:
torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of convolution for kernel of unknown shape.
[Caused by the value '1165 defined in (%1165 : Float(*, *, *, *, strides=[73728, 73728, 288, 1], requires_grad=1, device=cpu) = onnx::Reshape[allowzero=0](%1140, %1164), scope: anomalib.deploy.export.InferenceModel::/anomalib.models.image.reverse_distillation.torch_model.ReverseDistillationModel::model/anomalib.models.image.reverse_distillation.anomaly_map.AnomalyMapGenerator::anomaly_map_generator # /home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/filter.py:126:0
)' (type 'Tensor') in the TorchScript graph. The containing node has kind 'onnx::Reshape'.]
(node defined in /home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/filter.py(126): filter2d
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/filter.py(188): filter2d_separable
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/gaussian.py(65): gaussian_blur2d
/home/abogusze/repos/anomalib/src/anomalib/models/image/reverse_distillation/anomaly_map.py(90): forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/home/abogusze/repos/anomalib/src/anomalib/models/image/reverse_distillation/torch_model.py(85): forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/home/abogusze/repos/anomalib/src/anomalib/deploy/export.py(74): forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/jit/_trace.py(124): wrapper
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/jit/_trace.py(133): forward
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/jit/_trace.py(1285): _get_trace_graph
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/onnx/utils.py(915): _trace_and_get_graph_from_model
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/onnx/utils.py(1011): _create_jit_graph
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/onnx/utils.py(1135): _model_to_graph
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/onnx/utils.py(1596): _export
/home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/torch/onnx/utils.py(516): export
/home/abogusze/repos/anomalib/src/anomalib/deploy/export.py(217): export_to_onnx
/home/abogusze/repos/anomalib/src/anomalib/deploy/export.py(292): export_to_openvino
/home/abogusze/repos/anomalib/src/anomalib/engine/engine.py(910): export
/home/abogusze/repos/anomalib/tmp_train_export.py(21): <module>
)
Inputs:
#0: 1140 defined in (%1140 : Float(*, *, *, *, strides=[73728, 73728, 288, 1], requires_grad=1, device=cpu) = onnx::Pad[mode="reflect"](%1055, %1139), scope: anomalib.deploy.export.InferenceModel::/anomalib.models.image.reverse_distillation.torch_model.ReverseDistillationModel::model/anomalib.models.image.reverse_distillation.anomaly_map.AnomalyMapGenerator::anomaly_map_generator # /home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/filter.py:122:0
) (type 'Tensor')
#1: 1164 defined in (%1164 : int[] = prim::ListConstruct(%615, %1146, %1155, %1163), scope: anomalib.deploy.export.InferenceModel::/anomalib.models.image.reverse_distillation.torch_model.ReverseDistillationModel::model/anomalib.models.image.reverse_distillation.anomaly_map.AnomalyMapGenerator::anomaly_map_generator
) (type 'List[int]')
Outputs:
#0: 1165 defined in (%1165 : Float(*, *, *, *, strides=[73728, 73728, 288, 1], requires_grad=1, device=cpu) = onnx::Reshape[allowzero=0](%1140, %1164), scope: anomalib.deploy.export.InferenceModel::/anomalib.models.image.reverse_distillation.torch_model.ReverseDistillationModel::model/anomalib.models.image.reverse_distillation.anomaly_map.AnomalyMapGenerator::anomaly_map_generator # /home/abogusze/repos/anomalib/venv/lib/python3.10/site-packages/kornia/filters/filter.py:126:0
) (type 'Tensor')
There used to be an issue in 2022 related to this same blur #476. That's the reason blur.py was created. So I think switching to that inside RevDist might work.