cypress icon indicating copy to clipboard operation
cypress copied to clipboard

Detect and Recover when Browser Hangs/Crashes/Dies

Open emilyrohrbough opened this issue 2 years ago • 4 comments

Current behavior

Cypress does not handle browser tab crashes, hanging browsers or issues related to browsers unexpectedly dying. This cause Cypress to hang indefinitely until the process is manually stopped or CI times out.

Desired behavior

Cypress should handle tab crashes and timeout on browsers hangs.

  • Tab Crash - Cypress should handle closing the tab, reopening a new tab and continue the test execution.

  • Browser hangs - The Cypress runner should timeout the test, send the status to the server to end the test, report the failure to the dashboard (if recording enabled) before killing the current browser instance and launching a new instance to continue test execution.

The quick-(er) fix will be to fail the current test and pickup the next test to provide reporting on the tests that were able to run. The ideal solution would be re-attempting the test that experienced the crash to reduce test flake & CI costs for users and/or to help identify memory issues within the code under test.

Considerations to Keep in Mind

When the browser tab and/or instance is killed and re-launched, ensure we are release the node resources initially used to ensure JS memory does not grow with each launch.

It would be great if there was a way to capture the crash reason to provide users with better info (i.e. need to increase the memory with shm_size -- suggested as solution for #6695)

Test code to reproduce (chrome)

Can manually reproduce in Chrome in https://github.com/cypress-io/cypress-test-tiny/tree/issue-22506

  1. run npm run cypress:run-hang (enables browser debug logs with headed chrome)
  2. first spec runs, when cy.pause() starts, enter chrome://crash or chrome://hang in the URL to view behavior.

If running DEBUG=cypress* npm run cypress:run --browser chrome --headed you can see the full log output and the process_profiling logging continuously as Cypress hangs.

Cypress Version

Happening since v4.2. Current Version 10.3.0

Existing Issues Around This Behavior:

Issues to Do This Work:

  • Detect Browser Launching Crashes: #1022
  • Detect Browser Crashes: #6170 (all browsers), #1660 (electron)
  • Recover from Browser Crash: #349

Bug Reports:

  • Cypress Stuck/Hangs:
    • #8206
    • #18885
    • #19617
    • #22506
    • #9350 (possibly related - waiting on logs)
    • #20183
    • #6883
  • Killing Chrome Process Hangs Cypress:
    • #17893
    • #18002
  • Firefox Hanging:
    • #6449
    • likely related https://github.com/cypress-io/cypress-docker-images/issues/502

emilyrohrbough avatar Jun 30 '22 18:06 emilyrohrbough

Chrome Investigation

It appears the launcher/lib/browser is logging the browser instance error but does nothing to allow the server/lib/browsers instance to use it to connect to the browser-cri-client to connect to the chrome-remote-interface to listen to events and handle opening the browser, launch tabs and standardizing exiting/killing the browser instance consistently between electron/firefox/chrome/edge.

The server/lib/browsers/chrome instance does not appear to listen to crash/hang messages to either close the tab and reopen it or to restart the browser instance to continue tests. Instead, Cypress hangs and uses resources (having a running Cypress instance + crash Chrome instance that's been run for 20 hours now). Because it is outside the scope of the mocha runner and we don't have logic to timeout due to Cypress hanging, Cypress doesn't timeout itself. In CI it seems people manually kill the process or the CI instance times out due to inactivity.

I have not tired to reproduce on Firefox, but suspect we have a similar issue. Total shot in the dark, but maybe the frequently observed Firefox is unable to connect issue. Maybe it is hanging and we aren't capturing the message to properly kill and restart the instance. Possible resource: https://github.com/bsmedberg/crashfirefox-intentionally

Puppeteer handles by throwing a page crash error.

How to crash chrome the browser

  • https://stackoverflow.com/questions/40367087/how-to-crash-chrome-browser
  • crash - chrome://crash
cypress:launcher:browsers:chrome stderr: [79726:259:0629/122233.586969:ERROR:chrome_debug_urls.cc(173)] Intentionally crashing (with null pointer dereference) because user navigated to chrome://crash/
cypress-verbose:server:browsers:cri-client:recv:[<--] received CRI message { method: 'Inspector.targetCrashed', params: {} }
  • hang - chrome://hang
cypress:server:browsers:chrome stderr: [32066:259:0630/090145.853211:ERROR:chrome_debug_urls.cc(199)] Intentionally hanging ourselves with sleep infinite loop because user navigated to chrome://hang/
no CRI message for hang
  • quit - chrome://quit
  • kill - chrome://kill
  • restart - chrome://restart

Resources:

Chrome errors:

  • error code 5 - runtime error caused by - memory leak, chrome logic error, or chrome crash input not received.
    • https://windowsreport.com/chrome-error-code-5/
    • https://piunikaweb.com/2021/07/05/google-chrome-on-mac-aw-snap-error-5-when-opening-tabs-accessing-settings/
  • Aww Snap: Err code SIGTRAP
    • https://askubuntu.com/questions/1322126/every-once-a-while-my-chromium-snap-will-fail-to-load-any-page-a-reboot-always
    • chrome bug: SNAP updates in background causing crash: https://bugs.launchpad.net/ubuntu/+source/chromium-browser/+bug/1914918
  • known crash - app with large page: https://bugs.chromium.org/p/chromium/issues/detail?id=842679

emilyrohrbough avatar Jun 30 '22 18:06 emilyrohrbough

Hi @emilyrohrbough, thank you so much for checking out this issue! It has been with us for months and is very frustrating.

What I don't understand is that it works locally on my laptop with npx cypress run, but as soon as cypress runs via docker image in a pipeline, it comes to these crashes. Can you please explain this to me?

robrich7 avatar Jul 01 '22 08:07 robrich7

@jennifer-shehane Hi Jennifer, can you please tell us if and when the problem will be fixed?

robrich7 avatar Aug 01 '22 21:08 robrich7

If you experience the issue with hanging tests please try disabling the Command Log: https://docs.cypress.io/guides/references/troubleshooting#Disable-the-Command-Log

It is helped me to solve the issue with hanging tests

abezzubets avatar Sep 21 '22 07:09 abezzubets

If you experience the issue with hanging tests please try disabling the Command Log: https://docs.cypress.io/guides/references/troubleshooting#Disable-the-Command-Log

It is helped me to solve the issue with hanging tests

It didn't help for us unfortunately.

cosmith avatar Sep 28 '22 13:09 cosmith

Hey team, any updates or work arounds here?

pkalyan264 avatar Feb 16 '23 07:02 pkalyan264

I have the same problem but it's because of some sort of nasty memory leak which i have contrived a test to intentionally reproduce

SIGSTACKFAULT avatar May 05 '23 16:05 SIGSTACKFAULT

Hi, just checking if there's a progress on this issue?

rasis2 avatar Sep 22 '23 04:09 rasis2

Any news about this crashing ?? or any work around ?

pat-convex avatar Oct 05 '23 15:10 pat-convex