cypress
cypress copied to clipboard
Recover from renderer / browser crashes
Related to #348.
It is actually possible for Cypress to implement strategies when the renderer (or browser process) crashes during a test run - something like recoverFromRendererCrashes: true
by default.
There is already a mechanism for Cypress to "reload" mid-run, rebuild the state of every previous run test, skip over previously run tests, and continue with the next one in line.
In fact this is exactly what cy.visit
already does under the hood.
We can utilize this same process upon a renderer / browser process crashing to continue on with the run.
So it may look something like this:
(Running Tests)
✓ test 1 - foo
✓ test 2 - bar
✓ test 3 - baz
Oh noes the renderer process crashed... we will attempt to recover
...Restarting tests at 'test 4 - quux'
✓ test 4 - quux
✓ test 5 - ipsum
Taking this a step further, we are starting to see several patterns emerge with how and why renderer processes crash - it is almost always related to extremely long test runs in a memory starved environment (such as Docker).
It may even be a good idea for us to always preemptively "break up" headless runs by spec file.
In other words, we could have an option like restartBrowserBetweenSpecFiles: true
which would automatically kill the renderer / browser process before moving on to a different spec file (but still rebuild the state of the UI correctly, and still have a single contiguous video recording).
To the user it would look like nothing is really different, but internally the renderer process would be killed and then restarted.
This would forcefully purge primed memory from the process, which could keep environments like docker
from ever crashing to begin with.
Depends on: #6170
We actually have these crashes halfway in a single spec and we have stalling too. I tried debugging this with strace and it seems to be constantly trying to acquire some locks. Our app seems to make the browser allocate 400+ MB of memory fast and the whole suite can go up to 2 GB... So resetting between specs might not be enough. Maybe between it/test is also an option?
Setting the --ipc=host
does fix this, but I wonder what happens if two instances of the test run simultaneously. Could a clash occur?
How could two instances of the test run occur simultaneously? If you wanted to parallelize you would do it over two different docker containers.
yes, two docker instances. It might be a false fear of a clash. I'm completely unaware/ignorant of what the two docker instances do share with --ipc=host
Hi I am running test cases on aws ec2 small instance and I am having this issue https://on.cypress.io/renderer-process-crashed Is there any way to avoid this
Did you try the --ipc=host
fix?
But I am not using Docker
if not sandboxed, you might have multiple chrome instances fighting over resources. What is your setup? any concurrency? are you open to a different setup?
Any update on this? As Im now getting the error with Chromium usually crashes when running amount of test suites.
This issue has been superseded by this: https://github.com/cypress-io/cypress/issues/681
That will remove the need to recover since it fixes the problem at its core
We've started hitting this fairly frequently now too
I'm having this happen randomly on travis-ci with cypress 3.0.2 (I just recently started using cypress so no clue if it happened in a previous version). It might be good to add this flag even with #681 resolved.
Edit: I was able to resolve my issue by only calling .visit() once and resetting the state of the application between tests. I know that's not ideal, but it works for now.
In hindsight my fix with --ipc=host
might be related to the shared memory issue I described in https://github.com/cypress-io/cypress/issues/350 and giving the container more shared memory might resolve crashes.
I'm also getting this issue now with cypress v3.1.0. Any updates?
Hi cypress team!
We are also getting this error when we use cypress run
as well as cypress open
We noticed that it happens more when we use cy.wait
. We can consistently reproduce it when we use cy.wait
with a value greater than 20000
. This is on our circle-ci linux containers fyi.
Hi, I'm currently trying to use cypress in Gitlab CI. I Figured out most parts, except the browser crashing.
my current gitlab CI test job is the following:
test_dev:
only:
- dev
stage: test
image: cypress/base:10
script:
- npm i --save-dev cypress
- $(npm bin)/cypress run --reporter junit --reporter-options "mochaFile=results_[hash].xml,toConsole=true"
artifacts:
paths:
- cypress/videos
reports:
junit: results_*.xml
expire_in: 1 week
This works great when the browser doesn't crash, including test reporting in gitlab's merge requests. However, it fails 50% of the times. Using the --ipc=host tag is afaik not an option in Gitlab CI.
Have you tried increasing the shared memory instead, like I discribe in https://github.com/cypress-io/cypress/issues/350 ?
I am using shared runners on gitlab ci, and shm-size doesn't seem to be an option for shared runners. Thanks anyway
I think you can configure it using this documentation https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-docker-section
Hi Please provide fix / explanation of this issue. It is always happening on one test case (and only one test case). I do not think it has to do with memory but there is no way to know. I was able to reproduce it locally without docker. I think it has to do with origin (subdomain) changes. Thanks
EDIT: Just ran in debug mode there is no way of knowing what is causing this problem unfortunately
Hi, we're also experiencing this issue in Kubernetes (using Jenkins as our CI engine). Would be happy to provide additional information if helpful.
I've recently started running into the issue, as our codebase starts to acquire more dependencies. It's intermittent and unpredictable. Sometimes I get a passing test, sometimes it fails the moment it begins.
After more experimentation, I've found that using the cypress/browsers:chrome69 image instead of the cypress/base:10 made the issue go away. This issue is likely to be tied to an older version of electron being unable to handle a larger codebase, and I think more effort should go into updating electron.
One useful thing in meantime would be if Cypress could have some way to communicate this to the caller that the browser failed. Then I could re-run the test inside CI automatically. Maybe an exit code from npm call could be different? Or some other way to determine that tests failed because of Chrome failing and not because of tests failing. Could this be added in meantime? So recovery could then be done outside of Cypress.
I think that since this issue has been made there is now a better fix for the problem by asking Chrome not to use /dev/shm
. I opened #3633 for more details about this.
I'm hitting this issue on a small digital ocean droplet (no docker / container). The test runs perfectly a dozen or so times and then starts crashing with this error. If I reboot the droplet it starts working again then eventually dies. Looks like a memory leak to me.
There appears to be plenty of memory in my docker container
df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
shm 30G 8.0K 30G 1% /dev/shm
I'm also unable to figure out how to add the --ipc=host
flag for my CircleCI build... Doesn't appear to be an option.
I am using shared runners on gitlab ci, and shm-size doesn't seem to be an option for shared runners. Thanks anyway
I have this same issue.
A few days ago I started facing the same issue regardless no changes were made. It's running on Travis without docker and against a separate app that is not installed in the same code base.
What interesting, that switching to --browser chrome
seems to help with it, so looks like it is related to the electron no matter if it is headless or not - in both cases it's failing. However, with chrome, you lose the video recording.
Any progress on this topic? @brian-mann
I've become very impatient waiting for the Cypress folks to fix these crashing issues. In the meantime, I've created a very similar API using selenium and am having no memory issues. There's no recording of tests, but at least it's reliable. Here's a code snippet for you if you want to try it out.
import { Builder, ThenableWebDriver, By, WebElement, Key, Condition } from "selenium-webdriver"
/**
* Wrap any promised coming from the Selenium driver so that we can
* get stack traces that point to our code.
*/
async function wrapError<T>(p: Promise<T>) {
const e = new Error()
e["__wrapError"] = true
try {
const result = await p
// Wait just a little bit in case the browser is about to navigate
// or something.
await new Promise(resolve => setTimeout(resolve, 20))
return result
} catch (error) {
if (error["__wrapError"]) {
throw error
}
e.message = error.message
throw e
}
}
async function waitFor(
driver: ThenableWebDriver,
fn: () => Promise<boolean | object>,
timeout = 2000
) {
await driver.wait(
new Condition("wait", async () => {
try {
const result = await fn()
return Boolean(result)
} catch (error) {
return false
}
}),
timeout
)
}
class Element {
private promise: Promise<WebElement>
then: Promise<WebElement>["then"]
catch: Promise<WebElement>["catch"]
constructor(
public driver: ThenableWebDriver,
promise: Promise<WebElement> | WebElement
) {
this.promise = Promise.resolve(promise)
this.then = this.promise.then.bind(this.promise)
this.catch = this.promise.catch.bind(this.promise)
}
/** Map in the monadic sense. */
map(fn: (elm: WebElement) => Promise<WebElement | undefined | void>) {
return new Element(
this.driver,
wrapError(
this.promise.then(async elm => {
const result = await fn(elm)
if (result) {
return result
} else {
return elm
}
})
)
)
}
waitFor(fn: (elm: WebElement) => Promise<boolean | object>) {
return this.map(elm => waitFor(this.driver, () => fn(elm)))
}
mapWait(fn: (elm: WebElement) => Promise<WebElement>) {
return this.waitFor(fn).map(fn)
}
click() {
return this.map(elm => elm.click())
}
clear() {
return this.map(elm => elm.clear())
}
type(text: string) {
return this.map(elm => elm.sendKeys(text))
}
enter() {
return this.map(elm => elm.sendKeys(Key.RETURN))
}
backspace() {
return this.map(elm => elm.sendKeys(Key.BACK_SPACE))
}
find(selector: string) {
return this.mapWait(elm => {
return elm.findElement(By.css(selector))
})
}
findAll(selector: string) {
return new Elements(
this.driver,
this.promise.then(elm => {
return waitFor(this.driver, () =>
elm.findElements(By.css(selector))
).then(() => {
return elm.findElements(By.css(selector))
})
})
)
}
contains(text: string) {
return this.mapWait(elm => {
// TODO: escape text.
// https://stackoverflow.com/questions/12323403
return elm.findElement(By.xpath(`//*[contains(text(), '${text}')]`))
})
}
clickText(text: string) {
return this.contains(text).click()
}
}
class Elements {
private promise: Promise<Array<WebElement>>
then: Promise<Array<WebElement>>["then"]
catch: Promise<Array<WebElement>>["catch"]
constructor(
public driver: ThenableWebDriver,
promise: Promise<Array<WebElement>> | Array<WebElement>
) {
this.promise = Promise.resolve(promise)
this.then = this.promise.then.bind(this.promise)
this.catch = this.promise.catch.bind(this.promise)
}
/** Map in the monadic sense. */
map(
fn: (
elm: Array<WebElement>
) => Promise<Array<WebElement> | undefined | void>
) {
return new Elements(
this.driver,
wrapError(
this.promise.then(async elms => {
const result = await fn(elms)
if (Array.isArray(result)) {
return result
} else {
return elms
}
})
)
)
}
waitFor(fn: (elm: Array<WebElement>) => Promise<boolean | object>) {
return this.map(elm => waitFor(this.driver, () => fn(elm)))
}
mapWait(fn: (elm: Array<WebElement>) => Promise<Array<WebElement>>) {
return this.waitFor(fn).map(fn)
}
clickAll() {
return this.map(async elms => {
await Promise.all(elms.map(elm => elm.click()))
})
}
atIndex(index: number) {
return new Element(
this.driver,
wrapError(
this.promise.then(elms => {
const elm = elms[index]
if (!elm) {
throw new Error("Element not found!")
}
return elm
})
)
)
}
}
export class Browser {
private promise: Promise<void>
then: Promise<void>["then"]
catch: Promise<void>["catch"]
constructor(public driver: ThenableWebDriver, promise?: Promise<void>) {
this.promise = Promise.resolve(promise)
this.then = this.promise.then.bind(this.promise)
this.catch = this.promise.catch.bind(this.promise)
}
visit(route: string) {
return new Browser(
this.driver,
wrapError(
this.promise.then(async () => {
await this.driver.get(route)
})
)
)
}
rerender() {
return new Browser(this.driver, wrapError(rerender(this.driver)))
}
flushTransactions() {
return new Browser(this.driver, wrapError(flushTransactions(this.driver)))
}
find(selector: string) {
return new Element(
this.driver,
wrapError(
this.promise
.then(() => {
return waitFor(this.driver, async () =>
this.driver.findElement(By.css(selector))
)
})
.then(() => {
return this.driver.findElement(By.css(selector))
})
)
)
}
getClassName(className: string) {
return this.find("." + className)
}
getTitle() {
return this.driver.getTitle()
}
waitFor(fn: () => Promise<boolean>, timeout = 2000) {
return new Browser(this.driver, waitFor(this.driver, fn))
}
waitToLeave(url: string) {
return new Browser(
this.driver,
wrapError(
waitFor(
this.driver,
async () => {
const currentUrl = await this.driver.getCurrentUrl()
return url !== currentUrl
},
10000
)
)
)
}
waitForRoute(url: string) {
return new Browser(
this.driver,
wrapError(
waitFor(
this.driver,
async () => {
const currentUrl = await this.driver.getCurrentUrl()
return url === currentUrl
},
10000
)
)
)
}
}
We're seeing this issue crop up on Drone, which also doesn't support the --ipc=host
option. Our containers already have 16GB memory. Some notes on the behavior:
-
Electron logs an error message when it crashes, but actually fail the test run. Our build is green despite the fact that half the tests caused a renderer crash.
-
Chrome doesn't even log a message—it dies silently and the test run hangs forever.
-
The crash does appear to happen at the exact same time on every run, but it's not clear what we're doing to cause it. Rearranging our test code or skipping certain tests resolves the problem temporarily, but it always creeps back in.
I haven't contributed to Cypress before, but I'd be willing to take a stab at fixing the problem if someone (@brian-mann ?) can show me where to start. My team has lost a ton of time troubleshooting this and I'd love to put it to bed.
@nmuth Please see our contributing guide on how to start: https://github.com/cypress-io/cypress/blob/develop/CONTRIBUTING.md
Are you using version 3.3.1?