Summary

Enabled ExecuTorch QNN Intermediate Tensor Debugger.
Provide an API for users to define their own metrics
Offers a variety of output format to visualize the debug results: svg, csv, raw files.
A README file and tutorial script to guide users on how to debug a model. Example script: python examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py -b build-android -m SM8550 --device $DEVICE --dataset ../imagenet-mini/val/ --dump_intermediate_outputs

An example use case

MobileVit V2 has significant drop in accuracy in certain QNN versions, while QNN 2.29 has good accuracy. With the help of accuracy debugger, we have targeted the node native_group_norm_default_1 in the model. As shown below, in QNN 2.29, this node has a cos_similarity (QNN V.S. CPU) of 0.997, while all other QNN versions has cos_similarity of 0, which provides us some hint it is possibly this group_norm node that is causing accuracy drop.

What's Coming Next?

Currently, we dump CPU outputs by manually inserting observer nodes. However, ExecuTorch actually has built in methods (intermediate_output_capturer) that could dump intermediate output for us, in format of a dict{debug_handle : tensor_output}. We will enable debug_handle and reuse https://github.com/pytorch/executorch/blob/main/devtools/inspector/_intermediate_output_capturer.py in future instead.
Support graph with partitions
Support LLM models

Test plan

E2E example script test
- python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleUtilsScript.test_intermediate_debugger -s $DEVICE --model SM8650 --build_folder build-android/ --executorch_root . --image_dataset ../imagenet-mini/val/ --artifact ./e2e_test_debug
Simple model test
- python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_simple_model --model SM8550 --device $DEVICE --build_folder build-android
- python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_topk --model SM8550 --device $DEVICE --build_folder build-android

Nov 11 '25 12:11 winskuo-quic

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15735

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:x: 1 New Failure, 3 Unrelated Failures

As of commit 0dfbbd49c5f25e955769f018fc5971d49f3988be with merge base 82e37dfaac747ecf5b861f68acb152de2468c091 ():

NEW FAILURE - The following job has failed:

pull / test-openvino-linux / linux-job (gh) RuntimeError: Command docker exec -t 8d368358770977e4af718d337b404a647bf3f230cd3c6ce1249737e56ac325a5 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure) backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_8a4w_recipe
pull / unittest-editable / windows / windows-job (gh) (trunk failure) backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_8a4w_recipe

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / android / run-emulator (gh) (#16137) Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Nov 11 '25 12:11 pytorch-bot[bot]

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example @pytorchbot label "release notes: none"

For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Nov 11 '25 12:11 github-actions[bot]

Seems like missing some dependency

ModuleNotFoundError: No module named 'pydot'

Do you want to introduce this dependency in general?

Nov 15 '25 20:11 cccclai

Thank you for enable this feature and the detailed documentation! Just minor comments, can we move examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py to examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py just a bit clear on the content inside the folder

Nov 16 '25 01:11 cccclai

@winskuo-quic do you able to try story llm and see what is cos similarity it is?

Nov 17 '25 21:11 billmguo

Hi @winskuo-quic Thanks so much for this contribution — we really appreciate the Qualcomm team’s work here.

One question I had while reviewing the PR: several parts of the implementation seem to re-create functionality that already exists in ExecuTorch’s devtools (intermediate_output_capturer, numeric comparators, Inspector.calculate_numeric_gap, debug-handle-based operator matching, etc.), rather than extending the shared workflows.

This might simply be a gap in my understanding, so I’d love to learn more about your experience here:

Were there limitations or missing features in the current Inspector-based workflow that made it difficult to apply to QNN?
Or were there usability concerns that motivated introducing a separate set of APIs?

If there are gaps, we’d be very happy to collaborate and strengthen the shared debugging tools so all backends can benefit from a unified workflow.

Looking forward to your thoughts!

Nov 20 '25 07:11 Gasoonjia

Seems like missing some dependency
ModuleNotFoundError: No module named 'pydot'
Do you want to introduce this dependency in general?

Hi @cccclai, Thanks for the suggestion. As there are a couple of install_requirments.sh in the codebase, do you have any suggestions on which install_requirments.sh should we put this under?

Nov 24 '25 04:11 winskuo-quic

@winskuo-quic do you able to try story llm and see what is cos similarity it is?

Hi @billmguo, As mentioned in the PR summary and Limitations section under backends/qualcomm/debugger/README.md, LLM models are currently unsupported. This is in our TODO list and we will enable this in the future. Thanks.

Nov 24 '25 04:11 winskuo-quic

Hi @winskuo-quic Thanks so much for this contribution — we really appreciate the Qualcomm team’s work here.

One question I had while reviewing the PR: several parts of the implementation seem to re-create functionality that already exists in ExecuTorch’s devtools (intermediate_output_capturer, numeric comparators, Inspector.calculate_numeric_gap, debug-handle-based operator matching, etc.), rather than extending the shared workflows.

This might simply be a gap in my understanding, so I’d love to learn more about your experience here:

Were there limitations or missing features in the current Inspector-based workflow that made it difficult to apply to QNN?

Or were there usability concerns that motivated introducing a separate set of APIs?

If there are gaps, we’d be very happy to collaborate and strengthen the shared debugging tools so all backends can benefit from a unified workflow.

Looking forward to your thoughts!

Hi @Gasoonjia,

Thanks for reviewing the PR. The reason is written under the PR summary. I will also share more details in the email thread. Thanks

Nov 24 '25 04:11 winskuo-quic

Thank you for enable this feature and the detailed documentation! Just minor comments, can we move examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py to examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py just a bit clear on the content inside the folder

Just to confirm, do you mean moving from executorch/examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py to examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py? Thanks.

Nov 24 '25 05:11 winskuo-quic

hi @winskuo-quic:

Thanks again for your feedback, and your email reply. I understand this work may have been created quite a while ago in a private context.

I’m wondering whether there is any plan to migrate or align the current QNN debugger with the native ExecuTorch devtools — not just reusing intermediate_output_capturer or debug_handle, but more broadly integrating with the full debugging pipeline built around the Inspector.calculate_numeric_gap API for intermediate-output numerical discrepancy detection. Unifying these efforts could help us reduce divergence, avoid reinventing functionality, and centralize future debugging capabilities.

If you’re open to it, I’d be happy to discuss potential directions for collaboration on operator-level numerical discrepancy detection and how we can streamline the work going forward.

Thanks

Nov 25 '25 01:11 Gasoonjia

Hi @Gasoonjia,

I think we are also aiming to reduce as much of code redundancy as possible, which should align with your plan. I believe Inspector.calculate_numeric_gap will be helpful when debugging, however, we might still want to keep some of our features, such as ability to draw the .svg gaphs. Also, if I understand correctly, Inspector.calculate_numeric_gap currently only supports MSE, L1, and SNR. It would be awesome if we could have some public APIs for users to define their own metrics for numeric gap. We could discuss more in detail in future on how we could potentially unify and migrate some features if you think they can be combined. Thanks

Nov 25 '25 01:11 winskuo-quic

Hi @Gasoonjia,

I think we are also aiming to reduce as much of code redundancy as possible, which should align with your plan. I believe Inspector.calculate_numeric_gap will be helpful when debugging, however, we might still want to keep some of our features, such as ability to draw the .svg gaphs. Also, if I understand correctly, Inspector.calculate_numeric_gap currently only supports MSE, L1, and SNR. It would be awesome if we could have some public APIs for users to define their own metrics for numeric gap. We could discuss more in detail in future on how we could potentially unify and migrate some features if you think they can be combined. Thanks

Thanks for efficient reply and sharing your thoughts regarding our current API.

with https://github.com/pytorch/executorch/pull/15969, now we can create customize metrics for numerical gap detection.

Im more than happy to have your more thoughts regarding the API, and looking forward to the future cooperation!

Nov 25 '25 01:11 Gasoonjia

Thanks for efficient reply and sharing your thoughts regarding our current API.

with #15969, now we can create customize metrics for numerical gap detection.

Im more than happy to have your more thoughts regarding the API, and looking forward to the future cooperation!

Thanks for sharing the PR. I think this would be super helpful. Looking forward for future cooperation and transition to this API once the debug_handle is enabled! In the meanwhile, since debug_handle is not yet enabled, do you think this PR looks fine to you? Thanks

Nov 25 '25 02:11 winskuo-quic

Thanks for sharing the PR. I think this would be super helpful. Looking forward for future cooperation and transition to this API once the debug_handle is enabled! In the meanwhile, since debug_handle is not yet enabled, do you think this PR looks fine to you? Thanks

This PR looks fine to me and thanks for your contribution! Will let @cccclai for final stamp. I'm looking forward to working with you to contribute into devtool directly together In the future.

Nov 25 '25 02:11 Gasoonjia

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D87936803.

Nov 26 '25 18:11 meta-codesync[bot]

It seems like I merge some PRs that have conflict with this PR...can you rebase again?

Dec 02 '25 05:12 cccclai

It seems like I merge some PRs that have conflict with this PR...can you rebase again?

Done. Thanks

Dec 02 '25 08:12 winskuo-quic

There are some internal errors, I need to send a patch

Dec 12 '25 16:12 cccclai

Can you apply these changes

--- a/executorch/backends/qualcomm/debugger/TARGETS
+++ b/executorch/backends/qualcomm/debugger/TARGETS
@@ -10,3 +10,21 @@
         "fbsource//third-party/pypi/pandas:pandas",
     ]
 )
+
+runtime.python_library(
+    name = "qnn_intermediate_debugger",
+    srcs = [
+        "format_outputs.py",
+        "metrics_evaluator.py",
+        "qnn_intermediate_debugger.py",
+    ],
+    deps = [
+        "//caffe2:torch",
+        "//executorch/backends/qualcomm/_passes:passes",
+        "//executorch/backends/qualcomm/utils:utils",
+        "//executorch/devtools:lib",
+        "//executorch/exir:sym_util",
+        "fbsource//third-party/pypi/graphviz:graphviz",
+        "fbsource//third-party/pypi/pandas:pandas",
+    ],
+)
diff --git a/executorch/backends/qualcomm/tests/TARGETS b/executorch/backends/qualcomm/tests/TARGETS
--- a/executorch/backends/qualcomm/tests/TARGETS
+++ b/executorch/backends/qualcomm/tests/TARGETS
@@ -35,6 +35,7 @@
         "//executorch/examples/qualcomm:utils",
         "//executorch/examples/models:models",
         "//executorch/backends/qualcomm/debugger:utils",
+        "//executorch/backends/qualcomm/debugger:qnn_intermediate_debugger",
     ],
 )

Dec 12 '25 23:12 cccclai

Can you apply these changes

--- a/executorch/backends/qualcomm/debugger/TARGETS
+++ b/executorch/backends/qualcomm/debugger/TARGETS
@@ -10,3 +10,21 @@
         "fbsource//third-party/pypi/pandas:pandas",
     ]
 )
+
+runtime.python_library(
+    name = "qnn_intermediate_debugger",
+    srcs = [
+        "format_outputs.py",
+        "metrics_evaluator.py",
+        "qnn_intermediate_debugger.py",
+    ],
+    deps = [
+        "//caffe2:torch",
+        "//executorch/backends/qualcomm/_passes:passes",
+        "//executorch/backends/qualcomm/utils:utils",
+        "//executorch/devtools:lib",
+        "//executorch/exir:sym_util",
+        "fbsource//third-party/pypi/graphviz:graphviz",
+        "fbsource//third-party/pypi/pandas:pandas",
+    ],
+)
diff --git a/executorch/backends/qualcomm/tests/TARGETS b/executorch/backends/qualcomm/tests/TARGETS
--- a/executorch/backends/qualcomm/tests/TARGETS
+++ b/executorch/backends/qualcomm/tests/TARGETS
@@ -35,6 +35,7 @@
         "//executorch/examples/qualcomm:utils",
         "//executorch/examples/models:models",
         "//executorch/backends/qualcomm/debugger:utils",
+        "//executorch/backends/qualcomm/debugger:qnn_intermediate_debugger",
     ],
 )

Sorry I missed this message. I have applied the patch and pushed a new commit. Please have a look. Thanks

Dec 16 '25 02:12 winskuo-quic

Qualcomm AI Engine Direct - QNN ExecuTorch Intermediate Output Debugger

Summary

An example use case

What's Coming Next?

Test plan

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15735

:x: 1 New Failure, 3 Unrelated Failures

This PR needs a release notes: label

This PR needs a `release notes:` label