local change to export llama to qnn
-
AOT, generate qnn delegated model: python -m examples.models.llama2.export_llama --qnn --use_kv_cache -p /home/chenlai/models/stories110M/params.json -c /home/chenlai/models/stories110M/stories110M.pt
-
Runtime: follow build_llama_android.sh with QNN config on, then run: /llama_main --model_path=./stories_qnn_SM8450.pte --tokenizer_path=./tokenizer.bin --prompt="Once"
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2985
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:x: 5 New Failures
As of commit 796ae1cef53cee0ff3968b3de25cd9bfa06c399c with merge base d3326a2073dee7baf78044fb3afd0772edbc616a ():
NEW FAILURES - The following jobs have failed:
-
Lint / lintrunner / linux-job (gh)
>>> Lint for examples/models/llama2/llama_transformer.py: -
pull / test-llama-runner-linux (fp32, buck2, portable) / linux-job (gh)
RuntimeError: Command docker exec -t f8db1f04ffa27c1d432eba898f549c2a98cc3d71b4edafe87670fd8e5104d67c /exec failed with exit code 1 -
pull / test-llama-runner-linux (fp32, buck2, xnnpack+kv+custom) / linux-job (gh)
RuntimeError: Command docker exec -t 2d2e935f1d4da5be6bb495d853a8faba9ef47afacb15dd129eb1a78f73c8e9e3 /exec failed with exit code 1 -
pull / test-llama-runner-linux (fp32, cmake, portable) / linux-job (gh)
RuntimeError: Command docker exec -t f1a3a958fc32fe81b0b0c87467852990ad5a97f6a54a88b74b9f3a33d1f3939d /exec failed with exit code 1 -
pull / test-llama-runner-linux (fp32, cmake, xnnpack+kv+custom) / linux-job (gh)
RuntimeError: Command docker exec -t 1e9ea998826699c124c67ad043587024ab896a8543e07888d3d4381e2712c75f /exec failed with exit code 1
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Hi Chen, Thanks for your sharing. I trying to reproduce but I hit the error. May I ask what I have done less?
cmake-android-out/examples/models/llama2/llama_main: 1 file pushed. 36.5 MB/s (542730752 bytes in 14.174s)
llama2.pte: 1 file pushed. 66.5 MB/s (196377840 bytes in 2.816s)
tokenizer.bin: 1 file pushed. 17.4 MB/s (433869 bytes in 0.024s)
cmake-android-out/lib/libqnn_executorch_backend.so: 1 file pushed. 25.2 MB/s (1025160 bytes in 0.039s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/aarch64-android/libQnnHtp.so: 1 file pushed. 24.8 MB/s (1573896 bytes in 0.061s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/aarch64-android/libQnnHtpV75Stub.so: 1 file pushed. 20.3 MB/s (291992 bytes in 0.014s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/aarch64-android/libQnnSystem.so: 1 file pushed. 24.0 MB/s (230864 bytes in 0.009s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/hexagon-v75/unsigned/libQnnHtpV75Skel.so: 1 file pushed. 53.0 MB/s (12046348 bytes in 0.217s)
2024-04-12T11:13:36+08:00 - Running...
2024-04-12T11:13:36+08:00 - export LD_LIBRARY_PATH=/data/local/tmp/llama2_cc:/opt/qcom/aistack/qnn/2.21.0.240326/lib/x86_64-linux-clang && export ADSP_LIBRARY_PATH=/data/local/tmp/llama2_cc && cd /data/local/tmp/llama2_cc && ./llama_main --model_path=./llama2.pte --tokenizer_path=./tokenizer.bin --prompt='Once'
E 00:00:00.000208 executorch:operator_registry.cpp:75] Re-registering aten::sym_size.int, from NOT_SUPPORTED
E 00:00:00.000392 executorch:operator_registry.cpp:76] key: (null), is_fallback: true
F 00:00:00.000432 executorch:operator_registry.cpp:33] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
Aborted
sym_size
oh you may need this change...https://github.com/pytorch/executorch/pull/2934
In the meanwhile, this line probably need to be updated because there is a bug in the constant prop pass..
m = convert_pt2e(m, fold_quantize=False)
I've submit a change here https://github.com/pytorch/pytorch/pull/123909 to fix the constant prop pass and try to fix it
Also ideally qnn_executorch_backend doesn't necessarily need to depend on the whole executorch library, just these targets: https://github.com/pytorch/executorch/blob/main/runtime/backend/targets.bzl#L13-L32
sym_size
oh you may need this change...#2934
In the meanwhile, this line probably need to be updated because there is a bug in the constant prop pass..
m = convert_pt2e(m, fold_quantize=False)I've submit a change here pytorch/pytorch#123909 to fix the constant prop pass and try to fix it
Thanks for your reply. I will try it.
Also ideally
qnn_executorch_backenddoesn't necessarily need to depend on the whole executorch library, just these targets: https://github.com/pytorch/executorch/blob/main/runtime/backend/targets.bzl#L13-L32
That's great. We will try to refine our dependency.
For now, qnn_executorch_backend depends on executorch_no_prim_ops target.
https://github.com/pytorch/executorch/blob/6acc86ff5d869025cc874afba8051146b1daf112/backends/qualcomm/CMakeLists.txt#L251
May I know which target do you recommend?
Also ideally
qnn_executorch_backenddoesn't necessarily need to depend on the whole executorch library, just these targets: https://github.com/pytorch/executorch/blob/main/runtime/backend/targets.bzl#L13-L32That's great. We will try to refine our dependency. For now,
qnn_executorch_backenddepends on executorch_no_prim_ops target.https://github.com/pytorch/executorch/blob/6acc86ff5d869025cc874afba8051146b1daf112/backends/qualcomm/CMakeLists.txt#L251
May I know which target do you recommend?
probably need to check the corresponding cmake target...in buck, it's runtime/backend:interface, which should already include "//runtime/core:core", "//runtime/core:evalue", "//runtime/core:event_tracer", "//runtime/core:memory_allocator",
I can run it. May I check the results with you?
I get 37 partitions and accuracy is not good, such as "Once nieíoVA аas blablabla"
We have survey for our version. The reason seems related rms norm. We obeserve the bigger scale (about 10~30) for mul op in rms norm. When I fallback the rms norm (25 partitions), I will get better results, such as "Once upon a time, there was a mommy and a daddy blablalalb". But as you see, it still has a gap with expected result, "Once upon a time, there was a little girl named Lily. She loved to play outside". We are trying to fix it.