There were some refactor and looks like the custom example was not covered in CI

Run

python3 examples/qualcomm/custom_op/custom_ops_1.py --build_folder build-android -s R3CY50HEGYM -m SM8750 --op_package_dir examples/qualcomm/custom_op/example_op_package_htp/ExampleOpPackage --build_op_package

and output is

Output log

Quantizing(PTQ) the model...
WARNING:root:Op aten.unbind.int was requested for preservation by partitioner.  This request is ignored because it is in a blocklist.
WARNING:root:Op aten.unbind.int was requested for preservation by partitioner.  This request is ignored because it is in a blocklist.
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 1
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in SAVE MODE.
[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
[QNN Partitioner Op Support]: my_ops.mul3.default | True
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
INFO:executorch.backends.qualcomm.partition.qnn_partitioner:Qnn partitioner will delegate torch mutable buffer with the same I/O address during the runtime, so if your model contains mutable buffer, then you can get the better performance with skip_mutable_buffer=False. If you encounter accuracy issue during the runtime, then please set `skip_mutable_buffer=True` and try again.
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 1
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in SAVE MODE.
</details>
[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
INFO:executorch.backends.qualcomm.qnn_preprocess:Processing Method(0): (1/1)
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: quantized_decomposed_quantize_per_tensor_default, quantized_decomposed.quantize_per_tensor.default
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: my_ops_mul3_default, my_ops.mul3.default
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: quantized_decomposed_dequantize_per_tensor_tensor, quantized_decomposed.dequantize_per_tensor.tensor

====== DDR bandwidth summary ======
spill_bytes=0
fill_bytes=0
write_total_bytes=131584
read_total_bytes=125440

[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
WARNING:root:Op aten.unbind.int was requested for preservation by partitioner.  This request is ignored because it is in a blocklist.
./custom_op/custom_qnn.pte: 1 file pushed, 0 skipped. 166.7 MB/s (31652 bytes in 0.000s)
/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.34/lib/aarch64-android/libQnnHtp.so: 1 file pushed, 0 skipped. 302.9 MB/s (2193976 bytes in 0.007s)
/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.34/lib/hexagon-v79/unsigned/libQnnHtpV79Skel.so: 1 file pushed, 0 skipped. 384.9 MB/s (9087648 bytes in 0.023s)
/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.34/lib/aarch64-android/libQnnHtpV79Stub.so: 1 file pushed, 0 skipped. 263.7 MB/s (477208 bytes in 0.002s)
/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.34/lib/aarch64-android/libQnnHtpPrepare.so: 1 file pushed, 0 skipped. 373.1 MB/s (52389040 bytes in 0.134s)
/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.34/lib/aarch64-android/libQnnSystem.so: 1 file pushed, 0 skipped. 266.2 MB/s (2497656 bytes in 0.009s)
build-android/examples/qualcomm/executor_runner/qnn_executor_runner: 1 file pushed, 0 skipped. 444.4 MB/s (45963304 bytes in 0.099s)
build-android/backends/qualcomm/libqnn_executorch_backend.so: 1 file pushed, 0 skipped. 240.5 MB/s (646624 bytes in 0.003s)
/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.34/lib/aarch64-android/libQnnModelDlc.so: 1 file pushed, 0 skipped. 303.0 MB/s (2430512 bytes in 0.008s)
/data/users/chenlai/executorch/custom_op/input_list.txt: 1 file pushed, 0 skipped. 0.1 MB/s (14 bytes in 0.000s)
/data/users/chenlai/executorch/custom_op/input_0_0.raw: 1 file pushed, 0 skipped. 288.0 MB/s (100352 bytes in 0.000s)
examples/qualcomm/custom_op/example_op_package_htp/ExampleOpPackage/build/hexagon-v79/libQnnExampleOpPackage_HTP.so: 1 file pushed, 0 skipped. 110.0 MB/s (177136 bytes in 0.002s)
examples/qualcomm/custom_op/example_op_package_htp/ExampleOpPackage/build/aarch64-android/libQnnExampleOpPackage.so: 1 file pushed, 0 skipped. 340.1 MB/s (874888 bytes in 0.002s)
I 00:00:00.000608 executorch:qnn_executor_runner.cpp:232] Model file custom_qnn.pte is loaded.
I 00:00:00.000707 executorch:qnn_executor_runner.cpp:242] Using method forward
I 00:00:00.000718 executorch:qnn_executor_runner.cpp:289] Setting up planned buffer 0, size 200704.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 1
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
I 00:00:00.283815 executorch:qnn_executor_runner.cpp:313] Method loaded.
E 00:00:00.284038 executorch:method.cpp:1274] Output 0 is memory planned, or is a constant. Cannot override the existing data pointer.
I 00:00:00.284055 executorch:qnn_executor_runner.cpp:373] ignoring error from set_output_data_ptr(): 0x2
I 00:00:00.284061 executorch:qnn_executor_runner.cpp:376] Inputs prepared.
I 00:00:00.284115 executorch:qnn_executor_runner.cpp:382] Number of inputs: 1
I 00:00:00.284290 executorch:qnn_executor_runner.cpp:490] Perform 10 inference for warming up
I 00:00:04.366009 executorch:qnn_executor_runner.cpp:496] Start inference (0)
I 00:00:04.781286 executorch:qnn_executor_runner.cpp:514] 1 inference took 415.036000 ms, avg 415.036000 ms
I 00:00:04.782228 executorch:qnn_executor_runner.cpp:550] Total 1 inference took 415.036000 ms, avg 415.036000 ms
I 00:00:04.782429 executorch:qnn_executor_runner.cpp:615] Write etdump to /data/local/tmp/executorch/custom_qnn/etdump.etdp, Size = 1984
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Function not called, PrepareLib isn't loaded!

/data/local/tmp/executorch/custom_qnn/outputs/: 1 file pulled, 0 skipped. 0.9 MB/s (100352 bytes in 0.104s)
is_close? True

Oct 30 '25 23:10 cccclai

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15483

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:x: 18 New Failures, 1 Unrelated Failure

As of commit 50b850fb93803d9dc8ea8056ea722a7fc6f88048 with merge base 11f752cf84b296a39c0b74b889d618f279bc8186 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh) backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe
pull / unittest / macos / macos-job (gh) backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe
pull / unittest-editable / linux / linux-job (gh) backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe
pull / unittest-editable / macos / macos-job (gh) backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe
Test CUDA Builds / export-model-cuda-artifact (google, gemma-3-4b-it, non-quantized) / linux-job (gh) RuntimeError: Command docker exec -t 1478c64c8c17b249ebac7f1e2d953296c663eadeaca99ebf89bef7be07cbf12e /exec failed with exit code 1
Test CUDA Builds / export-model-cuda-artifact (google, gemma-3-4b-it, quantized-int4-tile-packed) / linux-job (gh) RuntimeError: Command docker exec -t 5b9fc0a4b65a70c6a877f0b48736497b819d62462e1048e9899de7c92226fa3b /exec failed with exit code 1
Test CUDA Builds / export-model-cuda-artifact (mistralai, Voxtral-Mini-3B-2507, non-quantized) / linux-job (gh) RuntimeError: Command docker exec -t ace7bab1668e3eb20e11e586e4df793cdfcbd74e05648f4d3a7f44831cf8d6c6 /exec failed with exit code 1
Test CUDA Builds / export-model-cuda-artifact (mistralai, Voxtral-Mini-3B-2507, quantized-int4-tile-packed) / linux-job (gh) RuntimeError: Command docker exec -t 9279549c981045441b84a12e31df8b334755eaba91f78f52daf6c3b428e8ad33 /exec failed with exit code 1
Test CUDA Builds / export-model-cuda-artifact (mistralai, Voxtral-Mini-3B-2507, quantized-int4-weight-only) / linux-job (gh) RuntimeError: Command docker exec -t 4a292766c0e372d7ab9eceec6487bc224a15a5729b1fabdeae1cb4e413a17b4a /exec failed with exit code 1
Test CUDA Builds / export-model-cuda-artifact (openai, whisper-large-v3-turbo, non-quantized) / linux-job (gh) RuntimeError: Command docker exec -t fae2221e9f5241f57e160ede1d60d4d2cec260ddaf26f2df1260f47c62717c54 /exec failed with exit code 1
Test CUDA Builds / export-model-cuda-artifact (openai, whisper-large-v3-turbo, quantized-int4-tile-packed) / linux-job (gh) RuntimeError: Command docker exec -t 9fa6ff08a0316d8a8983d0ce4e3785185608e2e85e7fcaee706b4e2329a788db /exec failed with exit code 1
Test CUDA Builds / export-model-cuda-artifact (openai, whisper-large-v3-turbo, quantized-int4-weight-only) / linux-job (gh) RuntimeError: Command docker exec -t f5b967e8b943c139de4ff5d4fc5aec7019b7e9ab307c3a7808caddc4824b54df /exec failed with exit code 1
Test CUDA Builds / export-model-cuda-artifact (openai, whisper-small, non-quantized) / linux-job (gh) RuntimeError: Command docker exec -t af8cdc5e959e0b27795222e6b2ebf75841aa9652b0f40eb09e17cf3ffda892e3 /exec failed with exit code 1
Test CUDA Builds / export-model-cuda-artifact (openai, whisper-small, quantized-int4-tile-packed) / linux-job (gh) RuntimeError: Command docker exec -t 579e5c25c89fdab0935afbe609c3eebf52759566f7b9d11f5a3436026e438eab /exec failed with exit code 1
Test CUDA Builds / export-model-cuda-artifact (openai, whisper-small, quantized-int4-weight-only) / linux-job (gh) RuntimeError: Command docker exec -t 4c84298a6ca5c3d2ef29f6b8835c55ea572ed7e41e80e8f42a5b73dccce4c681 /exec failed with exit code 1
Test Metal Backend / export-model-metal-artifact (mistralai, Voxtral-Mini-3B-2507, non-quantized) / macos-job (gh) RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 127
Test Metal Backend / export-model-metal-artifact (openai, whisper-large-v3-turbo, non-quantized) / macos-job (gh) RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 127
Test Metal Backend / export-model-metal-artifact (openai, whisper-small, non-quantized) / macos-job (gh) RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 127

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-binary-size-linux / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Oct 30 '25 23:10 pytorch-bot[bot]

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example @pytorchbot label "release notes: none"

For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Oct 30 '25 23:10 github-actions[bot]

Can I get a review on this?

Nov 06 '25 19:11 cccclai

Address comments

Nov 14 '25 01:11 cccclai

Fix custom op example

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15483

:x: 18 New Failures, 1 Unrelated Failure

This PR needs a release notes: label

This PR needs a `release notes:` label