android-emulator-runner icon indicating copy to clipboard operation
android-emulator-runner copied to clipboard

Runner action hangs after killing emulator with stop: not implemented

Open ericswpark opened this issue 2 years ago • 32 comments

When running my GitHub Actions workflow, the emulator runner action hangs after the emulator is killed, with the following log output:


Terminate Emulator
  /usr/local/lib/android/sdk/platform-tools/adb -s emulator-5554 emu kill
  OK: killing emulator, bye bye
  OK
  INFO    | Wait for emulator (pid 4126) 20 seconds to shutdown gracefully before kill;you can set environment variable ANDROID_EMULATOR_WAIT_TIME_BEFORE_KILL(in seconds) to change the default value (20 seconds)
INFO    | Discarding the changed state: command-line flag
WARNING | Discarding the changed state (command-line flag).
ERROR   | stop: Not implemented

ericswpark avatar Mar 18 '24 23:03 ericswpark

This seems to be related to #381. I debugged my workflow run with tmate which similarly had two crashpad_handler processes running and the same logs as in the issue.

Terminating the two crashpad_handler processes with SIGTERM allowed the android-emulator-runner step to complete.

grodin avatar Mar 22 '24 19:03 grodin

Seems like there should be a step at the end that kill -9s all the crashpad_handler processes.

Either that or disable crashpad_handler from running in the first place (I'm guessing it's some sort of error reporting mechanism from Google to report errors with the Android emulator?)

ericswpark avatar Mar 23 '24 21:03 ericswpark

So far I've managed to find out that crashpad-handler is the daemon part of crashpad, a crash reporter.

I've done some digging in the emulator source repo. It seems the emulator uses crashpad to report crashes back to Google, so we're not going to be able to prevent the crashpad-handler processes getting started.

Killing them after the emulator has shutdown seems like a reasonable workaround, but we should do that with SIGTERM first! Going straight to kill -KILL is a bit, er, overkill. I can't immediately think of any harm, given that the VM the action is running in will be thrown away soon, but it's generally not recommended to use SIGKILL until necessary.

I'm fairly certain that this will need to be done as part of the action though.

I suspect, but haven't confirmed, that the root cause of this is that when the emulator is told to shutdown, some node.js code sends a signal to the emulator, but then waits for the whole process tree to end, not just the emulator process. However, if the emulator doesn't pass on signals to it's child processes, they won't receive any signal telling them to quit, so the waiting will go on forever.

If I'm correct (will try to find out this week), trying to kill the extra processes outside of the action won't work, since that code will be waiting for the action to finish to have a chance to run. It's a classic deadlock effectively.

grodin avatar Mar 25 '24 15:03 grodin

Are there any updates here? My log output is exactly same as in the original issue, it completes all processes then just hangs forever. Trying to kill the crashpad process with kill -9 $(pgrep -f crashpad_handler) fails.

benszedlmayer avatar Jun 18 '24 15:06 benszedlmayer

I have a similar issue here: https://github.com/callstack/react-native-pager-view/actions/runs/9576519647/job/26403174451?pr=829 . Does anyone know how to fix it?

troZee avatar Jun 19 '24 05:06 troZee

I have the same issue. Emulator just hangs forever https://github.com/synonymdev/react-native-ldk/actions/runs/9724215820/job/26840196419?pr=251

limpbrains avatar Jun 29 '24 13:06 limpbrains

For anyone facing this issue, what @grodin mentioned regarding the crashpad_handler, you can use the following steps to terminate the processes:

- name: Kill crashpad_handler processes
  if: always()
  run: |
    pkill -SIGTERM crashpad_handler || true
    sleep 5
    pkill -SIGKILL crashpad_handler || true

This should definitely stop the hang issue.

EDIT: Revisited this recently, and seems like its fixed and not needed anymore, not sure if its the case for everyone or its still happening on certain scenarios...

mustalk avatar Jul 17 '24 21:07 mustalk

@mustalk are you sure that step will work? My understanding is that the previous step will hang and stop execution of that step that will kill crashpad_handler.

ericswpark avatar Jul 18 '24 03:07 ericswpark

@ericswpark that's what i thought at first too, but to my surprise it did execute, even without the if: always(), at least in my setup, give it a try.

mustalk avatar Jul 18 '24 09:07 mustalk

I was having this issue while running manually android emulator (I'm not using android-emulator-runner). And looking for answers I came here. After that I discovered the solution. You'll need to kill android emulator's qemu process with SIGSTOP. For example:

# Being XXXXXX pid for android sdk qemu-system process
kill STOP XXXXXXX

That will handle snapshot generation and crashpad_handler as expected and emulator will end successfully

fernando-jascovich avatar Jul 18 '24 15:07 fernando-jascovich

@mustalk your suggestion didn't work for me https://github.com/ashishb/adb-enhanced/actions/runs/10024919828/job/27707518728?pr=246, it is stuck at the emulator execution step for me

Strangely, it only impacts API 26 and 29 though for me.

ashishb avatar Jul 21 '24 02:07 ashishb

Hi Team,

We are also facing similar kind of issue. It was working 2 days back, but suddenly stops failing with below error, Screenshot 2024-07-24 at 11 34 52 PM

Script what we are using is, runs-on: macos-13 timeout-minutes: 25

  • name: Checkout the code uses: actions/checkout@v4

    • name: set up JDK 17 uses: actions/setup-java@v4 with: distribution: 'temurin' java-version: 17

      • name: Gradle cache uses: gradle/gradle-build-action@v3
    • name: AVD cache uses: actions/cache@v4 id: avd-cache with: path: | ~/.android/avd/* ~/.android/adb* key: avd-29

    • name: create AVD and generate snapshot for caching if: steps.avd-cache.outputs.cache-hit != 'true' uses: reactivecircus/android-emulator-runner@v2 env: ANDROID_EMULATOR_WAIT_TIME_BEFORE_KILL: 60 with: api-level: 29 force-avd-creation: false emulator-options: -no-window -gpu swiftshader_indirect -noaudio -no-boot-anim -no-metrics -camera-back none disable-animations: false script: echo "Generated AVD snapshot for caching."

    • name: Run espresso tests uses: reactivecircus/android-emulator-runner@v2 with: api-level: 29 avd-name: test force-avd-creation: false emulator-options: -no-snapshot-save -no-window -gpu swiftshader_indirect -noaudio -no-boot-anim -no-metrics -camera-back none disable-animations: true script: ./gradlew connectedMockDebugAndroidTest

Can some one suggest what is wrong here?

Bhuvanaarkala07 avatar Jul 24 '24 18:07 Bhuvanaarkala07

Fixes that worked for me was to use macos-latest instead

  1. https://github.com/ashishb/adb-enhanced/pull/246
  2. https://github.com/ashishb/adb-enhanced/pull/248

ashishb avatar Jul 28 '24 02:07 ashishb

We are already using runs-on: macos-13 , but still shwoing above error.

Bhuvanaarkala07 avatar Jul 28 '24 18:07 Bhuvanaarkala07

I am facing the same issue. The step does not terminate the emulator, and it stays stuck in the step. I tried @mustalk suggestion, but the workflow is not able to reach the step where it kills the crashpad_handler.

    - name: Set up the Android emulator and run tests
       uses: reactivecircus/android-emulator-runner@v2
       with:
         api-level: 33
         target: google_apis_playstore
         arch: x86_64
         emulator-boot-timeout: 600
         disable-animations: true
         script: ./scripts/run-tests.sh

Added process termination commands within the custom script ./scripts/run-tests.sh, but still no success. This script runs the Appium testing that I have integrated.

In addition, I have :

  • ANDROID_EMULATOR_WAIT_TIME_BEFORE_KILL: 60
  • And my job has the following values :
runs-on: ubuntu-latest
uses: reactivecircus/android-emulator-runner@v2

Context and Background

Emulator Running Android emulators on GitHub Actions can be challenging due to the lack of KVM support on ubuntu-latest.

macOS vs. Ubuntu The macos-latest runner includes pre-installed Android SDKs and better support for Android emulation. However, using macos-latest is more expensive compared to ubuntu-latest. Setting up a self-hosted macOS runner might be a cost-effective solution, but, I would like to try with the Ubuntu image first, if possible.

Hardware Acceleration Starting on February 23, 2023, GitHub Actions users can leverage hardware acceleration on larger Linux runners, significantly improving Android emulator performance. This requires adding the runner user to the KVM user group:

- name: Enable KVM group perms
  run: |
    echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666", OPTIONS+="static_node=kvm"' | sudo tee /etc/udev/rules.d/99-kvm4all.rules
    sudo udevadm control --reload-rules
    sudo udevadm trigger --name-match=kvm

Questions:

Any suggestions or guidance on resolving this issue would be greatly appreciated. Specifically, I need help ensuring the emulator terminates properly, and the workflow can proceed without getting stuck.

I read earlier that this could be fixed by using the macos-latest runner. Is there any possibility to fix this by using the ubuntu-latest one ?

Braggiouy avatar Jul 31 '24 09:07 Braggiouy

I've faced the same issue and was able to resolve it by making sure the appium instance I created in the test script gets shut down properly by the end of the script (cc @Braggiouy).

you can check pgrep -f appium before and after your script execution

vaind avatar Aug 01 '24 16:08 vaind

Thanks a million @vaind. That was indeed my issue. Seems that the appium instance was still running in the background, not allowing the Emulator to shut down properly. No need to manually kill the crashpad_handler. Good catch !

Braggiouy avatar Aug 02 '24 09:08 Braggiouy

I ran into a similar issue and seemingly fixed it by removing the step for AVD caching and setting force-avd-creation: true. This is obviously not ideal but it seems to get rid of the flakiness.

pyricau avatar Sep 09 '24 21:09 pyricau

Same issue here, from React Native. Everything was working fine. Then I started working on that to add a new dimension to our matrix and it is still hanging.

I tried all the solution proposed here:

  • kill the crashpad_handle ==> does nothing
  • kill the qemu_system process ==> the Terminate emulator command hangs with no logs (there is no emulator anymore)
  • moved to macos executor ==> does nothing
  • increased timeouts ==> does nothing
  • set force-avd-creation: true ==> does nothing

I'm now trying to give more power to the machine and to the emulator to see if it helps.

Weirdly, on main it is still working fine.

  • main: https://github.com/facebook/react-native/actions/runs/11004956599/job/30570724297
  • my working branch: https://github.com/facebook/react-native/actions/runs/10995492669/job/30527531422?pr=46573

@ychescale9 I'm sorry to ping you directly, but do you mind having a look at this or rerouting it to the right person? I might be able to help, if you need, but I'd need some guidance in the codebase.

cipolleschi avatar Sep 24 '24 09:09 cipolleschi

Does it fail with all API levels? And did it just start failing recently? Chances are a newer version of emulator binary or system image might have introduced new issues. If that's the case pinging the emulator-build to an older version might help.

ychescale9 avatar Sep 24 '24 11:09 ychescale9

Thanks! I think I find the issue: I have a local server to which the emulator connects by using websockets. If I teardown the server, the emulator shuts down properly. I think that the action is not able to shut down the emulator if there is something that it is connected to.

cipolleschi avatar Sep 25 '24 12:09 cipolleschi

For me it only happens when avd cache is enabled and the cache itself exists before the run. After removing cache file the whole pipeline goes well for the first time. On the second push it always fails. It's true for API <= 29. It never fails for API 34 and I haven't tested it yet between 29 and 34.

Can anybody confirm that having a cache can somehow cause the issue?

ochkarik05 avatar Oct 19 '24 07:10 ochkarik05

Braggiouy

@Braggiouy How did you manage to kill the appium instance ? I also have Appium tests integrated. I use:

kill -9 $(pgrep -f appium)

just right after the gradle command line to execute Appium tests. But the workflow never reaches that line, it sticks in the gradle command line causing the job hangs indefinitely.

trgdang avatar Nov 08 '24 09:11 trgdang

Braggiouy

@Braggiouy How did you manage to kill the appium instance ? I also have Appium tests integrated. I use:

kill -9 $(pgrep -f appium)

just right after the gradle command line to execute Appium tests. But the workflow never reaches that line, it sticks in the gradle command line causing the job hangs indefinitely.

In my solution, I use an external script that runs alongside reactivecircus/android-emulator-runner@v2.

This script is responsible for starting Appium, running the tests, and then killing the Appium process afterward.

As an example, this is how I managed to follow the previous steps :

# Start Appium Server
appium &
APPIUM_PID=$!
sleep 10

# Run Android Tests
yarn test:android

# Shut down Appium Server
kill $APPIUM_PID

Explanation: In my approach, I explicitly control the starting and stopping of Appium. First, I start Appium in the background and capture its process ID (PID). This way, I can track the specific Appium instance while the tests run. Once the tests are finished, I use the stored PID to kill that exact instance of Appium, ensuring it doesn’t interfere with any other processes.

This avoids the issue with the method you mentioned, where the kill -9 $(pgrep -f appium) command might terminate Appium too early or fail if the Gradle process is still running.

Hope this helps!

Braggiouy avatar Nov 08 '24 10:11 Braggiouy

Just for completeness sake, I've worked around this by adding the following line to the end of the script run by the action:

killall -INT crashpad_handler || true

The || true is just so that if something gets fixed somewhere and there aren't any crashpad_handler processes jamming things up, that won't fail the job.

It seems to fix things for me, so far.

grodin avatar Nov 21 '24 19:11 grodin

Thank you. :+1: It seems to work.

I noticed some Invalid argument warning from prctl in crashpad_client_linux.cc in the console output, though:

...
> Task :database:createDebugAndroidTestApkListingFileRedirect
[EmulatorConsole]: Failed to start Emulator console for 5554

> Task :database:connectedDebugAndroidTest
Starting 4 tests on emulator-5554 - 8.0.0

Finished 4 tests on emulator-5554 - 8.0.0

> Task :database:connectedAndroidTest

BUILD SUCCESSFUL in 56s
69 actionable tasks: 69 executed
Terminate Emulator
  /usr/local/lib/android/sdk/platform-tools/adb -s emulator-5554 emu kill
  OK: killing emulator, bye bye
  OK
  INFO         | Wait for emulator (pid 2248) 20 seconds to shutdown gracefully before kill;you can set environment variable ANDROID_EMULATOR_WAIT_TIME_BEFORE_KILL(in seconds) to change the default value (20 seconds)
  
USER_INFO    | Snapshots have been disabled by the user, save request is ignored.
[2815:2815:20250110,125053.533431:WARNING crashpad_client_linux.cc:419] prctl: Invalid argument (22)
ERROR        | stop: Not implemented
WARNING      | Emulator client has not yet been configured.. Call configure me first!
[2825:2825:20250110,125054.114918:WARNING crashpad_client_linux.cc:419] prctl: Invalid argument (22)

johnjohndoe avatar Jan 10 '25 15:01 johnjohndoe

name: Running Android Tests and sonar analyze
     runs-on: ubuntu-latest
     timeout-minutes: 20
     steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up JDK 20
        uses: actions/setup-java@v4
        with:
          java-version: 20
          distribution: 'adopt'

      - name: Change wrapper permissions
        run: chmod +x ./gradlew

      - name: Setup Android SDK
        uses: android-actions/setup-android@v3

      - name: Enable KVM group perms
        run: |
          echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666", OPTIONS+="static_node=kvm"' | sudo tee /etc/udev/rules.d/99-kvm4all.rules
          sudo udevadm control --reload-rules
          sudo udevadm trigger --name-match=kvm

  - name: Start Android Emulator
        uses: reactivecircus/android-emulator-runner@v2
        with:
           api-level: 35
           target: google_apis
           ram-size: 2048M
           disk-size: 4096M
           arch: x86_64
           profile: pixel_5
           avd-name: test-emulator
           disable-animations: true
           emulator-options: "-no-window -no-audio -gpu swiftshader "
           cmake: 3.10.2.4988404
           script: |
            adb wait-for-device
            adb shell input keyevent 82
            adb shell pm list instrumentation
            adb devices && adb shell getprop
            ./gradlew :lib:assembleDebugAndroidTest
            ./gradlew :lib:testDebugUnitTest && ./gradlew jacocoFullReport --info

alexinx avatar Feb 13 '25 11:02 alexinx

can i run mvn clean test outside the emulator script? apparently the emulator gets killed after script so any step after that fails because device is already offline

freddywaiganjo avatar Mar 27 '25 20:03 freddywaiganjo

I personally went for this:

    - name: 📲 Happy path E2E test
      uses: reactivecircus/android-emulator-runner@v2
      with:
          api-level: 30
          arch: x86_64
          target: google_apis
          force-avd-creation: false
          emulator-options: -no-snapshot-save -no-window -gpu swiftshader_indirect -noaudio -no-boot-anim -camera-back none
          script: e2e_tests.sh

e2e_tests.sh

#!/usr/bin/env bash

set -euo pipefail

terminate_crashpad_handler() {
  # Emulator might hang forever in some circumstances
  # Try to kill problematic process

  # Try SIGTERM first
  echo "Try stopping crashpad_handler…"
  pkill -f -SIGTERM crashpad_handler || true

  # Wait for the process to terminate, and SIGKILL after 5 seconds if still alive
  sleep 5
  if pgrep -f crashpad_handler >/dev/null; then
    echo "crashpad_handler still not terminated, try killing ☠️…"
    pkill -f -SIGKILL crashpad_handler || true
  fi
}

trap terminate_crashpad_handler EXIT

./gradlew connectedDebugAndroidTest

opatry avatar May 30 '25 18:05 opatry

Hi, is there a fix yet? Or some workaround? Neither of the scripts here mentioned seem to help.

besteFinanzen avatar Sep 24 '25 21:09 besteFinanzen