executorch Add event tracing and ETDumps to executor

Enabled via EXECUTORCH_ENABLE_EVENT_TRACER
Add flag 'etdump_path' to specify the file path for the ETDump file
Add flag 'num_executions' for number of iterations to run
Create and pass event tracer 'ETDumpGen'
Save ETDump to disk
Update docs to reflect the changes

Re-upload of https://github.com/pytorch/executorch/pull/4502 to discuss with @GregoryComer.

Sep 02 '24 11:09 benkli01

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5027

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit 662cb81726972c7f0e305d2ed0d4adc344686076 with merge base 7bc06d1a22a19c2a9546e66aae550572935ffa27 (): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Sep 02 '24 11:09 pytorch-bot[bot]

@pytorchbot label 'partner: arm'

Sep 02 '24 11:09 benkli01

@pytorchbot label ciflow/trunk

Sep 02 '24 11:09 benkli01

Can't add following labels to PR: ciflow/trunk. Please ping one of the reviewers for help.

Sep 02 '24 11:09 pytorch-bot[bot]

Hi @GregoryComer. Would it be possible to run the CI on your side to see if the issue from the previous PR is still occurring? I'm having a hard time understanding where this comes from.

Sep 04 '24 14:09 benkli01

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sep 10 '24 02:09 facebook-github-bot

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sep 12 '24 16:09 facebook-github-bot

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sep 17 '24 14:09 facebook-github-bot

Hi @dbort . Sorry for dragging you into this, but I saw your comment on EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT in the code, so I thought you might be able to help with resolving the failing test here. Any idea how to fix this?

Sep 23 '24 15:09 benkli01

I don't see a CI failure anymore

Sep 23 '24 15:09 digantdesai

I don't see a CI failure anymore

@digantdesai To me pull / test-llama-runner-qnn-linux (fp32, cmake, qnn) / linux-job (pull_request) is showing up as failing after my latest update. The CI run for the previous version you imported did not finish for me, i.e. I could not see any results, but it did not seem to have this test included anyway.

Sep 23 '24 15:09 benkli01

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sep 24 '24 18:09 facebook-github-bot

Yeah I see, from the main CMakeList, and qnn does have EXECUTORCH_ENABLE_EVENT_TRACER=ON

Sep 24 '24 18:09 digantdesai

Any update on this?

Sep 30 '24 15:09 digantdesai

Hi @digantdesai, I'm still hoping for some pointer from @dbort or you as I'm struggling to reproduce it locally and can't really make sense of the error.

Sep 30 '24 16:09 benkli01

@digantdesai will you have a look at this one since it touches code outside arm delegate. Thx!

Nov 29 '24 07:11 freddan80

The error shows up when running this script https://github.com/pytorch/executorch/blob/main/backends/qualcomm/scripts/build.sh based on the log.

If you have a linux machine, can you follow https://pytorch.org/executorch/stable/build-run-qualcomm-ai-engine-direct-backend.html and see if the script fails?

Nov 30 '24 18:11 cccclai

@cccclai I finally managed to reproduce the issue by running script backends/qualcomm/scripts/build.sh with the parameters from the CI script here. Interestingly, the issue seems to be caused by --job_number 2 (not my first guess). If I remove the parameter entirely, defaulting to --job_number 16, the issue disappears (not sure this is an acceptable solution and/or would work in the CI). I'm guessing that this is related to the TODO here. Any input on how to proceed would be much appreciated.

Dec 02 '24 16:12 benkli01

@cccclai I finally managed to reproduce the issue by running script backends/qualcomm/scripts/build.sh with the parameters from the CI script here. Interestingly, the issue seems to be caused by --job_number 2 (not my first guess). If I remove the parameter entirely, defaulting to --job_number 16, the issue disappears (not sure this is an acceptable solution and/or would work in the CI). I'm guessing that this is related to the TODO here. Any input on how to proceed would be much appreciated.

Ah I remember that. @dbort and @Olivia-liu, any thought on this?

Dec 04 '24 17:12 cccclai

Also for what it's worth, I'm trying to merge https://github.com/dvidelabs/flatcc/pull/306 into upstream flatcc to let us remove -DEXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT and similar hacks

Dec 05 '24 01:12 dbort

gmake[2]: *** No rule to make target '../third-party/flatcc/lib/libflatccrt.a', needed by 'executor_runner'.  Stop.

Thanks @dbort , this is the error exactly.

I'm not sure about the workaround to use release mode. The command used in the ci script is already using --release. As mentioned above, removing --num_jobs 2 seems to work locally, but it's a strange workaround and might not work in CI.

Dec 05 '24 16:12 benkli01

I'm not sure about the workaround to use release mode. The command used in the ci script is already using --release.

Ok, thanks for looking into that.

As mentioned above, removing --num_jobs 2 seems to work locally, but it's a strange workaround and might not work in CI.

That means that there's some kind of race condition.

Based on your PR, it looks like executor_runner has a proper dependency on libflatccrt; otherwise I would have expected a use-without-dependency situation.

...except maybe it doesn't. This PR adds a dep on "${FLATCCRT_LIB}" from the top-level CMakeLists.txt, but when I look for a place that sets FLATCCRT_LIB I only see it in the cmake config file at https://github.com/pytorch/executorch/blob/538bfaf3d5a25baf2d094c178b0fe9e708ed16f2/build/executorch-config.cmake#L51-L55

afaik, that config isn't included in the top-level cmake system. That file is used to point to an already-built version of the core ET libs from external projects, like https://github.com/pytorch/executorch/blob/538bfaf3d5a25baf2d094c178b0fe9e708ed16f2/examples/devtools/CMakeLists.txt#L39

If FLATCCRT_LIB is empty in this PR at https://github.com/pytorch/executorch/pull/5027/files#diff-1e7de1ae2d059d21e1dd75d5812d5a34b0222cef273b7c3a2af62eb747f9d20aR817-R819

list(APPEND _executor_runner_libs etdump ${FLATCCRT_LIB})

then executor_runner wouldn't properly depend on libflatccrt.a. A parallel build could cause that lib to be coincidentally built earlier with -j16, "fixing" the problem, while -j2 would be less likely to do so.

Could you try printing the value of FLATCCRT_LIB from the top-level CMakeLists.txt to see if it's empty?

Dec 07 '24 01:12 dbort

Though theoretically executor_runner shouldn't even need to know about libflatccrt: it should inherit the dep from the PUBLIC section of

https://github.com/pytorch/executorch/blob/538bfaf3d5a25baf2d094c178b0fe9e708ed16f2/devtools/CMakeLists.txt#L179-L183

But in this case, you could try updating this PR to use flatccrt as the dep instead of using ${FLATCCRT_LIB}. Even if FLATCCRT_LIB were defined, I think it's actually wrong to use it as the dep name -- I believe that the target is always called flatccrt even if, in debug mode, the file that it generates is called libflatccrt_d.a.

Dec 07 '24 01:12 dbort

Hi @dbort . I tried fixing the flatccrt dependency as suggested, but without any effect:

Replace ${FLATCCRT_LIB} with flatccrt
Remove flatccrt dependency completely and rely on inherited dependency from etdump

(I did this both in CMakeLists.txt and examples/qualcomm/executor_runner/CMakeLists.txt)

I did find a new workaround though, which should be more stable than just removing the --num_jobs 2: Run the command to build the QNN SDK twice, first clean then without cleaning in file .ci/scripts/build-qnn-sdk.sh.

I feel like the flatccrt issue is not related to my change so I will open an issue for it. I can push the above workaround in a separate PR. I will be off from tomorrow until next year, but I really hope we can find a solution together to get this PR merged.

Dec 12 '24 12:12 benkli01

@dbort has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Jan 22 '25 23:01 facebook-github-bot

Thanks for updating the dependency, and again I apologize for how long this has taken me to review. I'm running internal tests now and should be able to merge this soon.

Jan 22 '25 23:01 dbort

Thanks @dbort ! There is a linker error remaining in the qnn tests that I could not reproduce locally and I don't understand where it is coming from. Maybe you have an idea...

Jan 23 '25 10:01 benkli01

Thanks @dbort ! I managed to find and fix the issue by following your pointers a little further. I think the remaining CI failures are not related to my change.

The problem was that the QNN backend always set the definition ET_EVENT_TRACER_ENABLED without taking into account the status of CMake flag EXECUTORCH_ENABLE_EVENT_TRACER. I.e. when building with EXECUTORCH_BUILD_QNN=ON the definition ET_EVENT_TRACER_ENABLED was always enabled at compile time, but the needed libs might not have been linked depending on the CMake flags.

The fix: I removed the definition in the QNN backend here so that the behavior should now be fully controlled by the CMake flags.

Jan 24 '25 15:01 benkli01

@benkli01 Thank you for tracking down the problem with the Qualcomm jobs! @shewu-quic @cccclai please let us know if the change to qualcomm/CMakeLists.txt is ok.

Jan 28 '25 23:01 dbort

@dbort has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Jan 28 '25 23:01 facebook-github-bot

Add event tracing and ETDumps to executor_runner

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5027

:white_check_mark: No Failures