Fix onnx Attention and torch SDPA quantization handling
Changes
- added
ONNXAttentionMetatypefor the opset 23AttentionONNX node - fixed
scaled_dot_product_attentionquantization intorch2for the case whenQ,KandVare parallel edges coming from the same input node
Reason for changes
See #3750
Related tickets
Fixes #3750
Tests
-
tests/onnx/quantization/test_graphs.py::test_synthetic_models_graph[AttentionModel] -
tests/torch2/function_hook/quantization/test_quantized_graphs.py::test_quantized_graphs[unbind_scaled_dot_product_attention_model]
Hm. iirc onnx added support for opset 23 in version 1.18.0. So the new test is currently failing in CI due to
onnx==1.17.0; python_version < '3.13'
onnx==1.18.0; python_version >= '3.13'
Do you have any preferences if I should mark this test as
@pytest.mark.skipif(
version.parse(onnx.__version__) < version.parse("1.18.0"),
reason="Opset 23 was added in onnx 1.18.0",
)
or bump the version or something else?
Hm. iirc
onnxadded support for opset 23 in version1.18.0. So the new test is currently failing in CI due toonnx==1.17.0; python_version < '3.13' onnx==1.18.0; python_version >= '3.13'Do you have any preferences if I should mark this test as
@pytest.mark.skipif( version.parse(onnx.__version__) < version.parse("1.18.0"), reason="Opset 23 was added in onnx 1.18.0", )or bump the version or something else?
Hi @ruro, thanks for your contribution. We currently support multiple versions of ONNX, and the Attention operator was added in opset 23, which corresponds to ONNX 1.18.0. I believe we should run this test only for ONNX versions >= 1.18.0.