nncf icon indicating copy to clipboard operation
nncf copied to clipboard

Fix onnx Attention and torch SDPA quantization handling

Open ruro opened this issue 1 month ago • 2 comments

Changes

  • added ONNXAttentionMetatype for the opset 23 Attention ONNX node
  • fixed scaled_dot_product_attention quantization in torch2 for the case when Q, K and V are parallel edges coming from the same input node

Reason for changes

See #3750

Related tickets

Fixes #3750

Tests

  • tests/onnx/quantization/test_graphs.py::test_synthetic_models_graph[AttentionModel] attention_model dot

  • tests/torch2/function_hook/quantization/test_quantized_graphs.py::test_quantized_graphs[unbind_scaled_dot_product_attention_model] unbind_scaled_dot_product_attention_model dot

ruro avatar Nov 21 '25 08:11 ruro

Hm. iirc onnx added support for opset 23 in version 1.18.0. So the new test is currently failing in CI due to

onnx==1.17.0; python_version < '3.13'
onnx==1.18.0; python_version >= '3.13'

Do you have any preferences if I should mark this test as

@pytest.mark.skipif(
    version.parse(onnx.__version__) < version.parse("1.18.0"),
    reason="Opset 23 was added in onnx 1.18.0",
)

or bump the version or something else?

ruro avatar Nov 21 '25 10:11 ruro

Hm. iirc onnx added support for opset 23 in version 1.18.0. So the new test is currently failing in CI due to

onnx==1.17.0; python_version < '3.13'
onnx==1.18.0; python_version >= '3.13'

Do you have any preferences if I should mark this test as

@pytest.mark.skipif(
    version.parse(onnx.__version__) < version.parse("1.18.0"),
    reason="Opset 23 was added in onnx 1.18.0",
)

or bump the version or something else?

Hi @ruro, thanks for your contribution. We currently support multiple versions of ONNX, and the Attention operator was added in opset 23, which corresponds to ONNX 1.18.0. I believe we should run this test only for ONNX versions >= 1.18.0.

andrey-churkin avatar Nov 25 '25 09:11 andrey-churkin