Qualcomm AI Engine Direct - QNN ExecuTorch Intermediate Output Debugger
Summary
- Enabled ExecuTorch QNN Intermediate Tensor Debugger.
- Provide an API for users to define their own metrics
- Offers a variety of output format to visualize the debug results: svg, csv, raw files.
- A README file and tutorial script to guide users on how to debug a model. Example script:
python examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py -b build-android -m SM8550 --device $DEVICE --dataset ../imagenet-mini/val/ --dump_intermediate_outputs
An example use case
MobileVit V2 has significant drop in accuracy in certain QNN versions, while QNN 2.29 has good accuracy. With the help of accuracy debugger, we have targeted the node native_group_norm_default_1 in the model. As shown below, in QNN 2.29, this node has a cos_similarity (QNN V.S. CPU) of 0.997, while all other QNN versions has cos_similarity of 0, which provides us some hint it is possibly this group_norm node that is causing accuracy drop.
What's Coming Next?
- Currently, we dump CPU outputs by manually inserting observer nodes. However, ExecuTorch actually has built in methods (intermediate_output_capturer) that could dump intermediate output for us, in format of a dict{debug_handle : tensor_output}. We will enable
debug_handleand reuse https://github.com/pytorch/executorch/blob/main/devtools/inspector/_intermediate_output_capturer.py in future instead. - Support graph with partitions
- Support LLM models
Test plan
- E2E example script test
python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleUtilsScript.test_intermediate_debugger -s $DEVICE --model SM8650 --build_folder build-android/ --executorch_root . --image_dataset ../imagenet-mini/val/ --artifact ./e2e_test_debug
- Simple model test
python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_simple_model --model SM8550 --device $DEVICE --build_folder build-androidpython backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_topk --model SM8550 --device $DEVICE --build_folder build-android
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15735
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:x: 1 New Failure, 3 Unrelated Failures
As of commit 0dfbbd49c5f25e955769f018fc5971d49f3988be with merge base 82e37dfaac747ecf5b861f68acb152de2468c091 ():
NEW FAILURE - The following job has failed:
- pull / test-openvino-linux / linux-job (gh)
RuntimeError: Command docker exec -t 8d368358770977e4af718d337b404a647bf3f230cd3c6ce1249737e56ac325a5 /exec failed with exit code 1
BROKEN TRUNK - The following jobs failed but were present on the merge base:
👉 Rebase onto the `viable/strict` branch to avoid these failures
- pull / unittest / windows / windows-job (gh) (trunk failure)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_8a4w_recipe - pull / unittest-editable / windows / windows-job (gh) (trunk failure)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_8a4w_recipe
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
- pull / android / run-emulator (gh) (#16137)
Timeout waiting for emulator to boot.
This comment was automatically generated by Dr. CI and updates every 15 minutes.
This PR needs a release notes: label
If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.
To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"
For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.
Seems like missing some dependency
ModuleNotFoundError: No module named 'pydot'
Do you want to introduce this dependency in general?
Thank you for enable this feature and the detailed documentation! Just minor comments, can we move examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py to examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py just a bit clear on the content inside the folder
@winskuo-quic do you able to try story llm and see what is cos similarity it is?
Hi @winskuo-quic Thanks so much for this contribution — we really appreciate the Qualcomm team’s work here.
One question I had while reviewing the PR: several parts of the implementation seem to re-create functionality that already exists in ExecuTorch’s devtools (intermediate_output_capturer, numeric comparators, Inspector.calculate_numeric_gap, debug-handle-based operator matching, etc.), rather than extending the shared workflows.
This might simply be a gap in my understanding, so I’d love to learn more about your experience here:
- Were there limitations or missing features in the current Inspector-based workflow that made it difficult to apply to QNN?
- Or were there usability concerns that motivated introducing a separate set of APIs?
If there are gaps, we’d be very happy to collaborate and strengthen the shared debugging tools so all backends can benefit from a unified workflow.
Looking forward to your thoughts!
Seems like missing some dependency
ModuleNotFoundError: No module named 'pydot'Do you want to introduce this dependency in general?
Hi @cccclai,
Thanks for the suggestion. As there are a couple of install_requirments.sh in the codebase, do you have any suggestions on which install_requirments.sh should we put this under?
@winskuo-quic do you able to try story llm and see what is cos similarity it is?
Hi @billmguo,
As mentioned in the PR summary and Limitations section under backends/qualcomm/debugger/README.md, LLM models are currently unsupported. This is in our TODO list and we will enable this in the future.
Thanks.
Hi @winskuo-quic Thanks so much for this contribution — we really appreciate the Qualcomm team’s work here.
One question I had while reviewing the PR: several parts of the implementation seem to re-create functionality that already exists in ExecuTorch’s devtools (
intermediate_output_capturer,numeric comparators,Inspector.calculate_numeric_gap, debug-handle-based operator matching, etc.), rather than extending the shared workflows.This might simply be a gap in my understanding, so I’d love to learn more about your experience here:
- Were there limitations or missing features in the current Inspector-based workflow that made it difficult to apply to QNN?
- Or were there usability concerns that motivated introducing a separate set of APIs?
If there are gaps, we’d be very happy to collaborate and strengthen the shared debugging tools so all backends can benefit from a unified workflow.
Looking forward to your thoughts!
Hi @Gasoonjia,
Thanks for reviewing the PR. The reason is written under the PR summary. I will also share more details in the email thread. Thanks
Thank you for enable this feature and the detailed documentation! Just minor comments, can we move
examples/qualcomm/devtools/qnn_intermediate_debugger_demo.pytoexamples/qualcomm/devtools/qnn_intermediate_debugger_demo.pyjust a bit clear on the content inside the folder
Just to confirm, do you mean moving from executorch/examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py to examples/qualcomm/devtools/qnn_intermediate_debugger_demo.py? Thanks.
hi @winskuo-quic:
Thanks again for your feedback, and your email reply. I understand this work may have been created quite a while ago in a private context.
I’m wondering whether there is any plan to migrate or align the current QNN debugger with the native ExecuTorch devtools — not just reusing intermediate_output_capturer or debug_handle, but more broadly integrating with the full debugging pipeline built around the Inspector.calculate_numeric_gap API for intermediate-output numerical discrepancy detection. Unifying these efforts could help us reduce divergence, avoid reinventing functionality, and centralize future debugging capabilities.
If you’re open to it, I’d be happy to discuss potential directions for collaboration on operator-level numerical discrepancy detection and how we can streamline the work going forward.
Thanks
Hi @Gasoonjia,
I think we are also aiming to reduce as much of code redundancy as possible, which should align with your plan. I believe Inspector.calculate_numeric_gap will be helpful when debugging, however, we might still want to keep some of our features, such as ability to draw the .svg gaphs. Also, if I understand correctly, Inspector.calculate_numeric_gap currently only supports MSE, L1, and SNR. It would be awesome if we could have some public APIs for users to define their own metrics for numeric gap.
We could discuss more in detail in future on how we could potentially unify and migrate some features if you think they can be combined.
Thanks
Hi @Gasoonjia,
I think we are also aiming to reduce as much of code redundancy as possible, which should align with your plan. I believe
Inspector.calculate_numeric_gapwill be helpful when debugging, however, we might still want to keep some of our features, such as ability to draw the .svg gaphs. Also, if I understand correctly,Inspector.calculate_numeric_gapcurrently only supportsMSE,L1, andSNR. It would be awesome if we could have some public APIs for users to define their own metrics for numeric gap. We could discuss more in detail in future on how we could potentially unify and migrate some features if you think they can be combined. Thanks
Thanks for efficient reply and sharing your thoughts regarding our current API.
with https://github.com/pytorch/executorch/pull/15969, now we can create customize metrics for numerical gap detection.
Im more than happy to have your more thoughts regarding the API, and looking forward to the future cooperation!
Thanks for efficient reply and sharing your thoughts regarding our current API.
with #15969, now we can create customize metrics for numerical gap detection.
Im more than happy to have your more thoughts regarding the API, and looking forward to the future cooperation!
Thanks for sharing the PR. I think this would be super helpful. Looking forward for future cooperation and transition to this API once the debug_handle is enabled! In the meanwhile, since debug_handle is not yet enabled, do you think this PR looks fine to you? Thanks
Thanks for sharing the PR. I think this would be super helpful. Looking forward for future cooperation and transition to this API once the debug_handle is enabled! In the meanwhile, since debug_handle is not yet enabled, do you think this PR looks fine to you? Thanks
This PR looks fine to me and thanks for your contribution! Will let @cccclai for final stamp. I'm looking forward to working with you to contribute into devtool directly together In the future.
@cccclai has imported this pull request. If you are a Meta employee, you can view this in D87936803.
It seems like I merge some PRs that have conflict with this PR...can you rebase again?
It seems like I merge some PRs that have conflict with this PR...can you rebase again?
Done. Thanks
There are some internal errors, I need to send a patch
Can you apply these changes
--- a/executorch/backends/qualcomm/debugger/TARGETS
+++ b/executorch/backends/qualcomm/debugger/TARGETS
@@ -10,3 +10,21 @@
"fbsource//third-party/pypi/pandas:pandas",
]
)
+
+runtime.python_library(
+ name = "qnn_intermediate_debugger",
+ srcs = [
+ "format_outputs.py",
+ "metrics_evaluator.py",
+ "qnn_intermediate_debugger.py",
+ ],
+ deps = [
+ "//caffe2:torch",
+ "//executorch/backends/qualcomm/_passes:passes",
+ "//executorch/backends/qualcomm/utils:utils",
+ "//executorch/devtools:lib",
+ "//executorch/exir:sym_util",
+ "fbsource//third-party/pypi/graphviz:graphviz",
+ "fbsource//third-party/pypi/pandas:pandas",
+ ],
+)
diff --git a/executorch/backends/qualcomm/tests/TARGETS b/executorch/backends/qualcomm/tests/TARGETS
--- a/executorch/backends/qualcomm/tests/TARGETS
+++ b/executorch/backends/qualcomm/tests/TARGETS
@@ -35,6 +35,7 @@
"//executorch/examples/qualcomm:utils",
"//executorch/examples/models:models",
"//executorch/backends/qualcomm/debugger:utils",
+ "//executorch/backends/qualcomm/debugger:qnn_intermediate_debugger",
],
)
Can you apply these changes
--- a/executorch/backends/qualcomm/debugger/TARGETS +++ b/executorch/backends/qualcomm/debugger/TARGETS @@ -10,3 +10,21 @@ "fbsource//third-party/pypi/pandas:pandas", ] ) + +runtime.python_library( + name = "qnn_intermediate_debugger", + srcs = [ + "format_outputs.py", + "metrics_evaluator.py", + "qnn_intermediate_debugger.py", + ], + deps = [ + "//caffe2:torch", + "//executorch/backends/qualcomm/_passes:passes", + "//executorch/backends/qualcomm/utils:utils", + "//executorch/devtools:lib", + "//executorch/exir:sym_util", + "fbsource//third-party/pypi/graphviz:graphviz", + "fbsource//third-party/pypi/pandas:pandas", + ], +) diff --git a/executorch/backends/qualcomm/tests/TARGETS b/executorch/backends/qualcomm/tests/TARGETS --- a/executorch/backends/qualcomm/tests/TARGETS +++ b/executorch/backends/qualcomm/tests/TARGETS @@ -35,6 +35,7 @@ "//executorch/examples/qualcomm:utils", "//executorch/examples/models:models", "//executorch/backends/qualcomm/debugger:utils", + "//executorch/backends/qualcomm/debugger:qnn_intermediate_debugger", ], )
Sorry I missed this message. I have applied the patch and pushed a new commit. Please have a look. Thanks