cypress icon indicating copy to clipboard operation
cypress copied to clipboard

Cypress 10.8.0: Much slower running in CI and randomly failing tests

Open vikingair opened this issue 2 years ago • 4 comments

Current behavior

A successful test run with version 10.7.0: ~ 5 min A successful test run with version 10.8.0: ~ 12 min

Also, with 10.7.0 we had our tests very stable failing almost never.

With Version 10.8.0 the tests started to run into several random timeouts, but this could be related to the very slow execution times.

Browser: Electron 102 (headless)

Desired behavior

I would expected that Cypress would become faster with the recent update, because of the memory footprint optimizations.

Test code to reproduce

// code should be not of relevance

Cypress Version

10.8.0

Node version

v16.14.0

Operating System

Linux Ubuntu Latest GitHub Workflow

Debug Logs

No response

Other

Possibly related: https://github.com/cypress-io/cypress/issues/23824

vikingair avatar Sep 14 '22 16:09 vikingair

Also seeing this issue - tests running much more slowly on Cypress 10.8.0, with some hanging while consuming 100% CPU for 2+ hours on CI, until manually stopped.

Reverting back to Cypress 10.7.0, the same test suite finishes in well under 10 minutes.


CI is using the Docker image cypress/base:14.7.0 as a base, with Cypress 10.8.0 installed via NPM.

Exec-ing into a container with a hanging test, the main Cypress process, or at least the one with the lowest process id, was using 99% CPU, and the GPU process was using 37% CPU.

$ ps -elf
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 R root      1468    19 99  80   0 - 9942207 -    18:33 ?        00:10:35 /root/.cache/Cypress/10.8.0/Cypress/Cypress --no-sandbox -- --run-project ./ --ci-build-id Wed Sep 14 2022 18:33:52 GMT+0000 (Coordinated Universal Time) --env <redacted> --key <redacted> --output-path /tmp/tmp-19-QLkCqiVSY4TA --parallel --record true --spec ["<redacted>"] --tag <redacted> --cwd <redacted> --userNodePath /usr/local/bin/node --userNodeVersion 14.7.0
0 S root      1474  1468  0  80   0 - 8446627 SyS_pp 18:33 ?      00:00:00 /root/.cache/Cypress/10.8.0/Cypress/Cypress --type=zygote --no-zygote-sandbox --no-sandbox --enable-crashpad --enable-crashpad
0 S root      1475  1468  0  80   0 - 8446628 SyS_pp 18:33 ?      00:00:00 /root/.cache/Cypress/10.8.0/Cypress/Cypress --type=zygote --no-sandbox --enable-crashpad --enable-crashpad
1 S root      1627  1474 37  80   0 - 8488446 SyS_po 18:33 ?      00:03:03 /root/.cache/Cypress/10.8.0/Cypress/Cypress --type=gpu-process --no-sandbox --disable-dev-shm-usage --enable-crashpad --enable-crash-reporter=<redacted>,no_channel --user-data-dir=/root/.config/Cypress --gpu-preferences=WAAAAAAAAAAgAAAIAAAAAAAAAAAAAAAAAABgAAAAAAA4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAIAAAAAAAAAABAAAAAAAAAAgAAAAAAAAACAAAAAAAAAAIAAAAAAAAAA== --use-gl=angle --use-angle=swiftshader-webgl --shared-files --field-trial-handle=0,i,13177495466960604662,3729574163230753632,131072 --disable-features=SpareRendererForSitePerProcess
0 S root      1638  1468  2  80   0 - 679689 SyS_ep 18:33 ?       00:00:11 Cypress: Config Manager
0 S root      1651  1468  0  80   0 - 8462155 -    18:33 ?        00:00:00 /root/.cache/Cypress/10.8.0/Cypress/Cypress --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=none --no-sandbox --disable-dev-shm-usage --ignore-certificate-errors=true --use-fake-device-for-media-stream --ignore-certificate-errors=true --enable-crashpad --enable-crash-reporter=<redacted>,no_channel --user-data-dir=/root/.config/Cypress --shared-files=v8_context_snapshot_data:100 --field-trial-handle=0,i,13177495466960604662,3729574163230753632,131072 --disable-features=SpareRendererForSitePerProcess --enable-crashpad
1 S root      1836  1475  8  80   0 - 14284109 -   18:34 ?        00:00:41 /root/.cache/Cypress/10.8.0/Cypress/Cypress --type=renderer --enable-crashpad --enable-crash-reporter=<redacted>,no_channel --user-data-dir=/root/.config/Cypress --app-path=/root/.cache/Cypress/10.8.0/Cypress/resources/app --enable-sandbox --no-sandbox --disable-dev-shm-usage --autoplay-policy=no-user-gesture-required --force-device-scale-factor=1 --use-fake-ui-for-media-stream --disable-gpu-compositing --lang=en-US --num-raster-threads=1 --renderer-client-id=4 --launch-time-ticks=6908206572 --shared-files=v8_context_snapshot_data:100 --field-trial-handle=0,i,13177495466960604662,3729574163230753632,131072 --disable-features=SpareRendererForSitePerProcess
0 S root      1849  1468  3  99  19 - 110216 -     18:34 ?        00:00:17 /root/.cache/Cypress/10.8.0/Cypress/resources/app/node_modules/@ffmpeg-installer/linux-x64/ffmpeg -f image2pipe -use_wallclock_as_timestamps 1 -i pipe:0 -y -vcodec libx264 -filter:v crop='floor(in_w/2)*2:floor(in_h/2)*2' -preset ultrafast <redacted>.mp4

ericyhwang avatar Sep 14 '22 22:09 ericyhwang

That's definitely not good! I've been trying to reproduce this using our kitchensink repo, but haven't had any luck. I haven't seen any performance differences between 10.7.0 and 10.8.0 in historical CI runs or when running locally, both inside and outside Docker.

There's likely something in your tests or system setup that's being effected by changes in 10.8.0, but that isn't present in the kitchensink tests.

Is there any more information you can provide about your tests?

  • How many spec files?
  • How many total tests?
  • Can you provide the contents of your cypress.config file?
  • Can you run with debug logs (DEBUG=cypress* cypress run) and post the stdout from the run?

This is clearly a serious issue, but it will be difficult for us to pinpoint the cause of it without being able to reproduce it, so any information you provide can help. Thanks in advance!

chrisbreiding avatar Sep 16 '22 13:09 chrisbreiding

Sure.

  • Number of spec files: 9
  • Total tests: 12
  • Cypress-Config (cypress-audit with version: 1.1.0)
import { defineConfig } from 'cypress';
import { readFileSync } from 'fs';
// eslint-disable-next-line @typescript-eslint/ban-ts-comment
// @ts-ignore
import { lighthouse, pa11y, prepareAudit } from 'cypress-audit';

const conf = JSON.parse(readFileSync(process.env.CONF, 'utf-8'));

export default defineConfig({
    e2e: {
        setupNodeEvents(on, _config) {
            on('before:browser:launch', (_browser, launchOptions: any) => {
                prepareAudit(launchOptions);
            });

            on('task', {
                lighthouse: lighthouse(), // calling the function is important
                pa11y: pa11y(), // calling the function is important
            });
        },
        specPattern: 'src/**/*.cy.{js,jsx,ts,tsx}',
        baseUrl: conf.domain,
        env: {
            CONFIG: JSON.stringify(conf),
        },
        viewportWidth: 1200,
        viewportHeight: 800,
    },
});

Can you run with debug logs (DEBUG=cypress* cypress run) and post the stdout from the run?

Will try to get this information for you asap. But it requires a little effort, because we had to rollback the update already.

Did you try to reproduce it with the cypress-audit integration in the kitchensik repo? Didn't test the run without it.

vikingair avatar Sep 17 '22 11:09 vikingair

I've tested it now from my local machine (MacOS).

Looking at the debug output, I cannot send you the output with good conscience, because it contains a lot of sensible information that I'd not like to share. (e.g. login credentials, JWTs and used URLs etc.)

But my observation was that my tests executed roughly with exactly the same speed on both Cypress versions.

Our GitHub Actions workflow runs on "ubuntu-latest", but I currently have no linux users on my team to run it locally. If I get more insights, I'll keep you updated.

vikingair avatar Sep 20 '22 08:09 vikingair

@fdc-viktor-luft, could you attempt to re-create this in a docker container?

And is it correct that you're only seeing the slowdown inside a github Actions, not locally on your mac?

mjhenkes avatar Sep 27 '22 14:09 mjhenkes

@mjhenkes Yes, I can only reproduce this within GitHub Actions.

Didn't try it within a docker container yet. Will try to find some time to test it within a container. If I can reproduce it with our tests, I'll try to create a repro without all of our sensitive data for you.

vikingair avatar Sep 27 '22 16:09 vikingair

We are experiencing the same issue within our GitLab CI Pipeline running in a docker container

cypress/browsers:node14.17.0-chrome91-ff89

  │ Cypress:        10.8.0                                                                         │
  │ Browser:        Electron 102 (headless)                                                        │
  │ Node Version:   v14.17.0 (/usr/local/bin/node)                                                 │
  │ Specs:          59 found

some errors that are printed within the log that might be related

[331:0929/204824.767614:ERROR:node_bindings.cc(276)] Most NODE_OPTIONs are not supported in packaged apps. See documentation for more details.
libva error: va_getDriverName() failed with unknown libva error,driver_name=(null)
[491:0929/204841.439660:ERROR:gpu_memory_buffer_support_x11.cc(44)] dri3 extension not supported.
Couldn't determine Mocha version

our config

export default defineConfig({
    videosFolder: 'cypress/videos',
    screenshotsFolder: 'cypress/screenshots',
    downloadsFolder: 'cypress/downloads',
    fixturesFolder: 'cypress/fixtures',
    viewportWidth: 1920,
    viewportHeight: 1080,
    trashAssetsBeforeRuns: false,
    video: true,
    numTestsKeptInMemory: 0,
    reporter: 'cypress-multi-reporters',
    reporterOptions: {
        configFile: 'cypress/cypress-reporter-config.json',
    },
    e2e: {
        // We've imported your old cypress plugins here.
        // You may want to clean this up later by importing these.
        setupNodeEvents(on, config) {
            on('after:spec', async (spec, results) => {
                if (results && results.video) {
                    // Do we have failures for any retry attempts?
                    if (results?.tests?.every(test => test?.attempts?.every(attempt => attempt?.state !== 'failed'))) {
                        return rm(results.video, { recursive: true, });
                    }
                }
            });
            on('file:preprocessor', cypressTypeScriptPreprocessor);
            return registerCodeCoverageTasks(on, config);
        },
        baseUrl: 'http://localhost:4200',
        specPattern: './cypress/e2e/**/*',
        supportFile: './cypress/support/e2e.ts',
    },
});

capc0 avatar Sep 30 '22 06:09 capc0

Preparation

Dockerfile:

FROM node:16

RUN apt-get update
# see https://docs.cypress.io/guides/continuous-integration/introduction#Machine-requirements
RUN apt-get install -y libgtk2.0-0 libgtk-3-0 libgbm-dev libnotify-dev libgconf-2-4 libnss3 libxss1 libasound2 libxtst6 xauth xvfb

RUN npm i -g pnpm

WORKDIR /usr/tests

COPY . .
RUN pnpm i
CMD pnpm test

.dockerignore:

node_modules

Having [email protected] installed and running:

docker build -f Dockerfile -t tmp-cypress-10.7.0 .

Same for [email protected].

Execution

10.7.0

docker run tmp-cypress-10.7.0

Runs into an error:

Error [ERR_LOADER_CHAIN_INCOMPLETE]: "file:///root/.cache/Cypress/10.7.0/Cypress/resources/app/node_modules/ts-node/esm/transpile-only.mjs 'resolve'" did not call the next hook in its chain and did not explicitly signal a short circuit. If this is intentional, include `shortCircuit: true` in the hook's return.

10.9.0

docker run tmp-cypress-10.9.0

The run times reflect very much the times I see on the CI. Also many failing tests due to timeouts. However, docker on MacOS is still not very fast in with some operations and I couldn't compare it due to the above error to the run times using 10.7.0. So these results might not be of a great value for you.

vikingair avatar Sep 30 '22 12:09 vikingair

Same issue here with Azure Devops pipelines and angular project In our case we migrated from 10.7.0 to 10.9.0

And times passed from ~30 mins of agent running to 1hour 30 mins

mmonteiroc avatar Oct 06 '22 09:10 mmonteiroc

Do we have an update on this issue ? Is there any info that we could provide that would help you investigate ?

mmonteiroc avatar Oct 17 '22 06:10 mmonteiroc

@mmonteiroc would you be able to recreate in azure devops pipeline and invite us to the project to see the behavior? I can see if I can get a reproduction running in actions later today, but azure devops is a bit difficult for us to work with since it isn't exactly open source.

AtofStryker avatar Oct 17 '22 14:10 AtofStryker

Not sure I can invite you ( as we are corporate, and thats IT department.... ) But I can run in our DevOps pipeline any command you want with any cypress version you want, and then provide you the raw output logs Would that be helpful @AtofStryker ?

mmonteiroc avatar Oct 19 '22 08:10 mmonteiroc

@mmonteiroc That could be pretty difficult. Likely what I will do is take a stab at github actions with a similar docker approach mentioned above to see if I can get a reliable reproduction

AtofStryker avatar Oct 19 '22 15:10 AtofStryker

@AtofStryker I have a similar issue, how can I invite you to a private project?

danielvianna avatar Oct 19 '22 19:10 danielvianna

I am running cypress v10.10.0 upgraded from v10.6.0 and found out the runtime increase from 15 min to 30 min for my tests using Buildkite pipeline and docker container running node 16 browser electron

Incarnation avatar Oct 19 '22 19:10 Incarnation

@AtofStryker I have a similar issue, how can I invite you to a private project?

@danielvianna if its a github repo, you can invite my username to the organization/repo

AtofStryker avatar Oct 19 '22 21:10 AtofStryker

@AtofStryker It's a private gitlab repository using gitlab CI/CD, is that doable?

danielvianna avatar Oct 19 '22 22:10 danielvianna

@danielvianna that should work. My gitlab user profile is here.

AtofStryker avatar Oct 19 '22 22:10 AtofStryker

@AtofStryker - invited, let me know if you need higher permission levels, we also have MS teams that we can invite for faster collaboration

danielvianna avatar Oct 20 '22 14:10 danielvianna

@danielvianna Thank you. I saw the invite in Gitlab today. Today was fairly eventful, but hoping to take a look first thing Monday.

AtofStryker avatar Oct 21 '22 21:10 AtofStryker

@danielvianna had a bit of time to take a look through. It's a bit hard to see from the job itself since polling is at its limits, but did notice the acceptance tests are on a fairly old version of node. Have you tried bumping to 16+ and see if the issue improves at all? The CPU usage looks to be maxed out completely for a large majority of the tests. Have you tried throwing more compute resources at the job to try to get that number lower and see if performance improves? I wouldn't expect these things to be related to a minor version bump but figured it might be a start to rule some things out.

AtofStryker avatar Oct 24 '22 14:10 AtofStryker

In GitHub Actions we were running on Node > 16 running into these performance issues. And the used machine resources are pre-configured but very powerful 😥

vikingair avatar Oct 24 '22 14:10 vikingair

I wouldn't expect these things to be related to a minor version bump but figured it might be a start to rule some things out.

It seems several persons have complained here at approximately the same time. Also, in our case, sticking to 10.7.0 fixes the issue. On the right, you will see the result of the PR reverting to 10.7.0: image

DamienCassou avatar Oct 24 '22 14:10 DamienCassou

It seems several persons have complained here at approximately the same time. Also, in our case, sticking to 10.7.0 fixes the issue.

I want to be clear that there is some type of regression in 10.7.0 and don't want to invalidate what you are experiencing. What I am trying to rule out is resourcing and other external factors that may help isolate the problematic change. My guess is the memory footprint changes or some of the changes under the hood to webkit caused a side effect.

Triage team is meeting this afternoon. Since reproduction is limited, I have a few suggestions how we might be able to isolate the change.

AtofStryker avatar Oct 24 '22 15:10 AtofStryker

Hi Atof, we will try the newest node version. But it also crashes on my computer as well . Basically crashes every 3/4 tests or long tests. All of that happened after we were forced to upgrade to newest versions after version 8 which was way faster and stable

danielvianna avatar Oct 24 '22 17:10 danielvianna

@danielvianna Interesting that it is happening locally as well. Does this happen in all browsers and what OS are you on?

AtofStryker avatar Oct 24 '22 20:10 AtofStryker

It does on Electron and Chrome

MacOS Monterrey, Version 12.2.1 (21D62) Mac mini (M1, 2020) Memory 16GB

danielvianna avatar Oct 26 '22 02:10 danielvianna

My issue is only on Electron runtime increased from 15 min to 30 min after upgrade to v10.10.0 from v.10.6.0 running on Buildkite pipeline and docker container

after switch the browser to Chrome the runtime drop from 30 min to 17 min

Incarnation avatar Oct 26 '22 18:10 Incarnation

@AtofStryker . I'm not sure if you checked, but we bumped up the version of Node and tests became slower or didn't improve at all

danielvianna avatar Nov 02 '22 00:11 danielvianna

I see @mschile is assigned for this task, I can invite you to our repository if you want

danielvianna avatar Nov 02 '22 22:11 danielvianna