executorch icon indicating copy to clipboard operation
executorch copied to clipboard

Crash (native stack) for llama2 demo on Android

Open AmarOk1412 opened this issue 1 year ago • 9 comments

Environment

Followed: https://pytorch.org/executorch/stable/llm/llama-demo-android.html

Downloaded llama2-chat (7B) and run the android demo (Pixel 6, Android 14).

Using commit: 088cedfb9beb45ee4a66759e7bddc52e9366989b (HEAD, tag: v0.2.1-rc5, tag: v0.2.1, origin/release/0.2)

Reproduce steps

  1. Run the demo
  2. Input:
<s>
[INST]
<<SYS>>
You're a parser. Your role is to take a sentence in argument, parse it and output a minized JSON and only a JSON (NO OTHER SENTENCE).
The sentence can contains several ingredients (with optional quantity and units)
E.g: "50ml of milk" = {"ingredients":[{"ingredient": "milk", "quantity": 50, "unit": "mL"}]}
Or: "salt and pepper" = {"ingredients":[{"ingredient": "salt"},{"ingredient":"pepper"}]}
Avoid useless words e.g.: "salt as you wish" = {"ingredients":[{"ingredient": "salt"}]}.
Units should be metrics or for american kitchen (like cups, oz, cL, mL)
Sentences can be from French or english
<</SYS>>
"salt and pepper"
[/INST]
  1. Generated => Crash

Output

Crash:

2024-07-05 10:53:47.789 31097-31179 libc                    com.example.executorchllamademo      A  Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 31179 (Thread-3), pid 31097 (utorchllamademo)
---------------------------- PROCESS STARTED (31186) for package com.example.executorchllamademo ----------------------------
2024-07-05 10:53:48.867 31183-31183 DEBUG                   crash_dump64                         A  Cmdline: com.example.executorchllamademo
2024-07-05 10:53:48.867 31183-31183 DEBUG                   crash_dump64                         A  pid: 31097, tid: 31179, name: Thread-3  >>> com.example.executorchllamademo <<<
2024-07-05 10:53:48.867 31183-31183 DEBUG                   crash_dump64                         A        #01 pc 00000000035c4624  /data/app/~~m7Cz0fiHuS0Pomw97DvFSQ==/com.example.executorchllamademo-Bo37H0Ldn-EE8QF5dXCgcg==/lib/arm64/libexecutorch_llama_jni.so (et_pal_abort+8) (BuildId: 39b8648fd1a373ba9a341709299c076d41970345)
2024-07-05 10:53:48.867 31183-31183 DEBUG                   crash_dump64                         A        #02 pc 00000000035c45d4  /data/app/~~m7Cz0fiHuS0Pomw97DvFSQ==/com.example.executorchllamademo-Bo37H0Ldn-EE8QF5dXCgcg==/lib/arm64/libexecutorch_llama_jni.so (torch::executor::runtime_abort()+8) (BuildId: 39b8648fd1a373ba9a341709299c076d41970345)
2024-07-05 10:53:48.867 31183-31183 DEBUG                   crash_dump64                         A        #03 pc 00000000035b7f78  /data/app/~~m7Cz0fiHuS0Pomw97DvFSQ==/com.example.executorchllamademo-Bo37H0Ldn-EE8QF5dXCgcg==/lib/arm64/libexecutorch_llama_jni.so (torch::executor::Runner::generate(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, int, std::__ndk1::function<void (std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)>, std::__ndk1::function<void (torch::executor::Runner::Stats const&)>)+4384) (BuildId: 39b8648fd1a373ba9a341709299c076d41970345)
2024-07-05 10:53:48.867 31183-31183 DEBUG                   crash_dump64                         A        #04 pc 000000000012fefc  /data/app/~~m7Cz0fiHuS0Pomw97DvFSQ==/com.example.executorchllamademo-Bo37H0Ldn-EE8QF5dXCgcg==/lib/arm64/libexecutorch_llama_jni.so (executorch_jni::ExecuTorchLlamaJni::generate(facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<executorch_jni::ExecuTorchLlamaCallbackJni>)+124) (BuildId: 39b8648fd1a373ba9a341709299c076d41970345)
2024-07-05 10:53:48.867 31183-31183 DEBUG                   crash_dump64                         A        #05 pc 000000000013019c  /data/app/~~m7Cz0fiHuS0Pomw97DvFSQ==/com.example.executorchllamademo-Bo37H0Ldn-EE8QF5dXCgcg==/lib/arm64/libexecutorch_llama_jni.so (facebook::jni::detail::MethodWrapper<int (executorch_jni::ExecuTorchLlamaJni::*)(facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<executorch_jni::ExecuTorchLlamaCallbackJni>), &(executorch_jni::ExecuTorchLlamaJni::generate(facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<executorch_jni::ExecuTorchLlamaCallbackJni>)), executorch_jni::ExecuTorchLlamaJni, int, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<executorch_jni::ExecuTorchLlamaCallbackJni> >::dispatch(facebook::jni::alias_ref<facebook::jni::detail::JTypeFor<facebook::jni::HybridClass<executorch_jni::ExecuTorchLlamaJni, facebook::jni::detail::BaseHybridClass>::JavaPart, facebook::jni::JObject, void>::_javaobject*>, facebook::jni::alias_ref<_jstring*>&&, facebook::jni::alias_ref<executorch_jni::ExecuTorchLlamaCallbackJni>&&)+88) (BuildId: 39b8648fd1a373ba9a341709299c076d41970345)
2024-07-05 10:53:48.867 31183-31183 DEBUG                   crash_dump64                         A        #06 pc 00000000001300a4  /data/app/~~m7Cz0fiHuS0Pomw97DvFSQ==/com.example.executorchllamademo-Bo37H0Ldn-EE8QF5dXCgcg==/lib/arm64/libexecutorch_llama_jni.so (facebook::jni::detail::FunctionWrapper<int (*)(facebook::jni::alias_ref<facebook::jni::detail::JTypeFor<facebook::jni::HybridClass<executorch_jni::ExecuTorchLlamaJni, facebook::jni::detail::BaseHybridClass>::JavaPart, facebook::jni::JObject, void>::_javaobject*>, facebook::jni::alias_ref<_jstring*>&&, facebook::jni::alias_ref<executorch_jni::ExecuTorchLlamaCallbackJni>&&), facebook::jni::detail::JTypeFor<facebook::jni::HybridClass<executorch_jni::ExecuTorchLlamaJni, facebook::jni::detail::BaseHybridClass>::JavaPart, facebook::jni::JObject, void>::_javaobject*, int, facebook::jni::alias_ref<_jstring*>, facebook::jni::alias_ref<executorch_jni::ExecuTorchLlamaCallbackJni> >::call(_JNIEnv*, _jobject*, _jstring*, facebook::jni::detail::JTypeFor<executorch_jni::ExecuTorchLlamaCallbackJni, facebook::jni::JObject, void>::_javaobject*, int (*)(facebook::jni::alias_ref<facebook::jni::detail::JTypeFor<facebook::jni::HybridClass<executorch_jni::ExecuTorchLlamaJni, facebook::jni::detail::BaseHybridClass>::JavaPart, facebook::jni::JObject, void>::_javaobject*>, facebook::jni::alias_ref<_jstring*>&&, facebook::jni::alias_ref<executorch_jni::ExecuTorchLlamaCallbackJni>&&))+84) (BuildId: 39b8648fd1a373ba9a341709299c076d41970345)
2024-07-05 10:53:48.868 31183-31183 DEBUG                   crash_dump64                         A        #12 pc 0000000000ebfb18  /data/app/~~m7Cz0fiHuS0Pomw97DvFSQ==/com.example.executorchllamademo-Bo37H0Ldn-EE8QF5dXCgcg==/oat/arm64/base.vdex (com.example.executorchllamademo.MainActivity$2.run+0)
---------------------------- PROCESS ENDED (31097) for package com.example.executorchllamademo ----------------------------

However in native debug I didn't get good results to dig more

AmarOk1412 avatar Jul 05 '24 14:07 AmarOk1412

Hi @AmarOk1412 do you have the logs for the system? Any logs from lmkd? With Pixel 6 it might get OOM killed

kirklandsign avatar Jul 05 '24 19:07 kirklandsign

2024-07-06 14:58:59.log Sure here is the log

I don't show any log for lmkd

AmarOk1412 avatar Jul 06 '24 19:07 AmarOk1412

@AmarOk1412 Thank you for the updates! So you see no output tokens at all, and it just crashed?

kirklandsign avatar Jul 08 '24 22:07 kirklandsign

Yes exactly.

AmarOk1412 avatar Jul 08 '24 22:07 AmarOk1412

Hi @AmarOk1412 does it work for you on recent build?

kirklandsign avatar Sep 04 '24 00:09 kirklandsign

For now, I can't use executorch 3.0 due to

  Error while generating /home/amarok/Projects/executorch/pip-out/temp.linux-x86_64-cpython-311/cmake-out/executorch_srcs.cmake. Exit code: 1
  Output:

  Error:
  Traceback (most recent call last):
    File "/home/amarok/Projects/executorch/build/buck_util.py", line 26, in run
      cp: subprocess.CompletedProcess = subprocess.run(
                                        ^^^^^^^^^^^^^^^
    File "/usr/lib64/python3.11/subprocess.py", line 571, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/home/amarok/Projects/executorch/pip-out/temp.linux-x86_64-cpython-311/cmake-out/buck2-bin/buck2-3bbde7daa94987db468d021ad625bc93dc62ba7fcb16945cb09b64aab077f284', 'cquery', "inputs(deps('//runtime/executor:program'))"]' returned non-zero exit status 2.

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "/home/amarok/Projects/executorch/build/extract_sources.py", line 218, in <module>
      main()
    File "/home/amarok/Projects/executorch/build/extract_sources.py", line 203, in main
      target_to_srcs[name] = sorted(target.get_sources(graph, runner))
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/amarok/Projects/executorch/build/extract_sources.py", line 116, in get_sources
      sources: set[str] = set(runner.run(["cquery", query]))
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/amarok/Projects/executorch/build/buck_util.py", line 31, in run
      raise RuntimeError(ex.stderr.decode("utf-8")) from ex
  RuntimeError: [2024-09-09T15:58:13.282-04:00] Build ID: cf93cd4f-c786-45c6-8c4c-2652dbfdc829
  [2024-09-09T15:58:13.305-04:00] File changed: root//.ci/docker/ci_commit_pins/torchao.txt
  [2024-09-09T15:58:13.305-04:00] File changed: root//.ci/scripts/build-qnn-sdk.sh
  [2024-09-09T15:58:13.305-04:00] File changed: root//.ci/scripts/setup-ios.sh
  [2024-09-09T15:58:13.305-04:00] 43815 additional file change events
  Command failed:
  From load at implicit location

  Caused by:
      0: From load at third-party/prelude/prelude.bzl:8
      1: From load at third-party/prelude/native.bzl:15
      2: From load at third-party/prelude/apple/apple_macro_layer.bzl:14
      3: From load at third-party/prelude/apple/apple_rules_impl_utility.bzl:10
      4: From load at third-party/prelude/apple/apple_bundle_types.bzl:8
      5: Error evaluating module: `prelude//apple/debug.bzl`
      6: Traceback (most recent call last):
           * third-party/prelude/apple/debug.bzl:34, in <module>
               _AppleDebugInfo = record(
         error: String literals are not allowed in type expressions: `"ArtifactTSet"` (at third-party/prelude/apple/debug.bzl:34:19-37:2)
           --> third-party/prelude/apple/debug.bzl:34:19
            |
         34 |   _AppleDebugInfo = record(
            |  ___________________^
         35 | |     debug_info_tset = "ArtifactTSet",
         36 | |     filtered_map = field([dict[Label, list[Artifact]], None]),
         37 | | )
            | |_^
            |

Full log:

executorch.v3.0-python3.11.log

So can't generate new .pte

AmarOk1412 avatar Sep 09 '24 20:09 AmarOk1412

Could you please try with the main branch and remove pip-out?

kirklandsign avatar Sep 09 '24 21:09 kirklandsign

From a fresh clone, both on main branch or v0.3.0 I have:

  [ 11%] Creating directories for 'fxdiv'
  [ 22%] Performing download step (git clone) for 'fxdiv'
  Cloning into 'FXdiv-source'...
  Already on 'master'
  Your branch is up to date with 'origin/master'.
  [ 33%] Performing update step for 'fxdiv'
  -- Fetching latest from the remote origin
  [ 44%] No patch step for 'fxdiv'
  [ 55%] No configure step for 'fxdiv'
  [ 66%] No build step for 'fxdiv'
  [ 77%] No install step for 'fxdiv'
  [ 88%] No test step for 'fxdiv'
  [100%] Completed 'fxdiv'
  [100%] Built target fxdiv
  -- Using python executable '/home/amarok/Projects/executorch/.venv3.10/bin/python3.10'
  -- Resolved buck2 as /home/amarok/Projects/executorch/pip-out/temp.linux-x86_64-cpython-310/cmake-out/buck2-bin/buck2-3bbde7daa94987db468d021ad625bc93dc62ba7fcb16945cb09b64aab077f284.
  -- Killing buck2 daemon
  -- executorch: Generating source lists
  -- executorch: Generating source file list /home/amarok/Projects/executorch/pip-out/temp.linux-x86_64-cpython-310/cmake-out/executorch_srcs.cmake
  Error while generating /home/amarok/Projects/executorch/pip-out/temp.linux-x86_64-cpython-310/cmake-out/executorch_srcs.cmake. Exit code: 1
  Output:

  Error:
  Traceback (most recent call last):
    File "/home/amarok/Projects/executorch/build/buck_util.py", line 26, in run
      cp: subprocess.CompletedProcess = subprocess.run(
    File "/usr/lib64/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/home/amarok/Projects/executorch/pip-out/temp.linux-x86_64-cpython-310/cmake-out/buck2-bin/buck2-3bbde7daa94987db468d021ad625bc93dc62ba7fcb16945cb09b64aab077f284', 'cquery', "inputs(deps('//runtime/executor:program'))"]' returned non-zero exit status 2.

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "/home/amarok/Projects/executorch/build/extract_sources.py", line 218, in <module>
      main()
    File "/home/amarok/Projects/executorch/build/extract_sources.py", line 203, in main
      target_to_srcs[name] = sorted(target.get_sources(graph, runner))
    File "/home/amarok/Projects/executorch/build/extract_sources.py", line 116, in get_sources
      sources: set[str] = set(runner.run(["cquery", query]))
    File "/home/amarok/Projects/executorch/build/buck_util.py", line 31, in run
      raise RuntimeError(ex.stderr.decode("utf-8")) from ex
  RuntimeError: Command failed:
  Error validating working directory

  Caused by:
      0: Failed to stat `/home/amarok/Projects/executorch/buck-out/v2`
      1: ENOENT: No such file or directory


  CMake Error at build/Utils.cmake:191 (message):
    executorch: source list generation failed
  Call Stack (most recent call first):
    CMakeLists.txt:311 (extract_sources)


  -- Configuring incomplete, errors occurred!
  error: command '/home/amarok/Projects/executorch/.venv3.10/bin/cmake' failed with exit code 1
  error: subprocess-exited-with-error
  
  × Building wheel for executorch (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /home/amarok/Projects/executorch/.venv3.10/bin/python3.10 /home/amarok/Projects/executorch/.venv3.10/lib64/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpigm3l8y0
  cwd: /home/amarok/Projects/executorch
  Building wheel for executorch (pyproject.toml) ... error
  ERROR: Failed building wheel for executorch
Failed to build executorch
ERROR: Could not build wheels for executorch, which is required to install pyproject.toml-based projects

AmarOk1412 avatar Sep 10 '24 14:09 AmarOk1412

Hi @AmarOk1412 probably need to clean up everything like pip-out and cmake-out* when we switch between release versions 😅

kirklandsign avatar Oct 02 '24 21:10 kirklandsign