cypress icon indicating copy to clipboard operation
cypress copied to clipboard

Weird flicker in video for end to end test on updating to v10.10.0

Open lidiagc opened this issue 1 year ago • 8 comments

Current behavior

We were previously using cypress version 9.7.0 and now want to update to version 10.10.0. However, we sometimes get an issue with the end-to-end tests when running on GitLab CI.

As you can see from the video, a weird flicker happens during the test and it eventually fails. Does it seem that the test is jumping up and down the commands as it was running two instances of the test? The video is only showing the start of the test and is pixelated since this is a private project.

https://user-images.githubusercontent.com/43796105/197748959-0059a24d-fa4a-4f33-95e0-5579ec479ccd.mp4

We haven't been able to reproduce this behavior locally, and it doesn't constantly happen on the CI. It is also not specific to this test, sometimes the other end-to-end tests fail occasionally and the videos have the same weird flicker. These tests didn't have this issue before upgrading cypress version.

Desired behavior

We would like to be able to update cypress to version 10, but this issue is delaying the update since we cannot rely on the end-to-end tests for our CI.

Test code to reproduce

We can't provide the full test code since this is a private project.

  • cypress.config.ts
import { defineConfig } from "cypress";

export default defineConfig({
    chromeWebSecurity: false,
    defaultCommandTimeout: 15000,
    viewportWidth: 1920,
    viewportHeight: 1080,
    videoUploadOnPasses: false,

    projectId: "xxxxxx",

    e2e: {
        setupNodeEvents(on, _config) {},

        retries: {
            runMode: 0,
        },

        specPattern: "cypress/e2e/**/*.{ts,tsx}",
    },
});
  • cypress-dockerfile
FROM cypress/included:10.10.0

RUN npm install -g [email protected]

# other packages installed necessary to run our e2e tests
  • gitlab-ci.yml
itest:end-to-end:
    image: ${ITEST_IMAGE}
    stage: itest
    script:
      # backend necessary scripts
  
      - cd frontend
      - echo -e "\e[0Ksection_start:`date +%s`:frontend-deps[collapsed=true]\r\e[0KInstalling frontend deps"
      - CYPRESS_INSTALL_BINARY=0 pnpm install --no-verify-store-integrity --frozen-lockfile -r
      - echo -e "\e[0Ksection_end:`date +%s`:frontend-deps\r\e[0K"
      - cypress run
        --config baseUrl="http://localhost:8001/",defaultCommandTimeout=60000,watchForFileChanges=false,pageLoadTimeout=100000,numTestsKeptInMemory=0
        --browser chrome
        --reporter mocha-junit-reporter
        --reporter-options 'mochaFile=cypress/reports/junit/junit-[hash].xml'
        --env KC_HOSTNAME="localhost:8080"
        --spec clusters-active.ts
    artifacts:
      when: on_failure
      paths:
        - frontend/cypress/screenshots
        - frontend/cypress/videos
        - logs
      expire_in: 15 days
      reports:
        junit: frontend/cypress/reports/junit/*.xml
  • gitlab output
$ cypress run --config baseUrl="http://localhost:8001/",defaultCommandTimeout=60000,watchForFileChanges=false,pageLoadTimeout=100000,numTestsKeptInMemory=0 --browser chrome --reporter mocha-junit-reporter --reporter-options 'mochaFile=cypress/reports/junit/junit-[hash].xml' --env KC_HOSTNAME="localhost:8080" --spec clusters-active.ts
[10659:1017/234749.919499:ERROR:node_bindings.cc(279)] Most NODE_OPTIONs are not supported in packaged apps. See documentation for more details.
[10659:1017/234750.894445:ERROR:zygote_host_impl_linux.cc(263)] Failed to adjust OOM score of renderer with pid 10889: Permission denied (13)
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
[10889:1017/234750.909020:ERROR:gpu_memory_buffer_support_x11.cc(44)] dri3 extension not supported.
====================================================================================================
  (Run Starting)
  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ Cypress:        10.10.0                                                                        │
  │ Browser:        Chrome 100 (headless)                                                          │
  │ Node Version:   v16.14.2 (/usr/local/bin/node)                                                 │
  │ Specs:          1 found (clusters-active.ts)                                                   │
  │ Searched:       cypress/e2e/customer/clusters-active.ts                                        │
  └────────────────────────────────────────────────────────────────────────────────────────────────┘
────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                                                    
  Running:  clusters-active.ts                                                              (1 of 1)
Timed out waiting for the browser to connect. Retrying...
  (Results)
  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ Tests:        9                                                                                │
  │ Passing:      0                                                                                │
  │ Failing:      1                                                                                │
  │ Pending:      0                                                                                │
  │ Skipped:      8                                                                                │
  │ Screenshots:  2                                                                                │
  │ Video:        true                                                                             │
  │ Duration:     1 minute, 40 seconds                                                             │
  │ Spec Ran:     clusters-active.ts                                                               │
  └────────────────────────────────────────────────────────────────────────────────────────────────┘
  (Screenshots)
  -  /builds/.../frontend/cypress/screenshots/clusters-act     (1280x720)
     ive.ts/test -- before all hook (failed).png                        
  -  /builds/.../frontend/cypress/screenshots/clusters-act     (1280x720)
     ive.ts/test -- after all hook (failed).png                         
  (Video)
  -  Started processing:  Compressing to 32 CRF                                                     
  -  Finished processing: /builds/.../frontend/cypress/vid    (6 seconds)
                          eos/clusters-active.ts.mp4                                                
====================================================================================================
  (Run Finished)
       Spec                                              Tests  Passing  Failing  Pending  Skipped  
  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ ✖  clusters-active.ts                       01:40        9        -        1        -        8 │
  └────────────────────────────────────────────────────────────────────────────────────────────────┘
    ✖  1 of 1 failed (100%)                     01:40        9        -        1        -        8  

Cypress Version

10.10.0

Node version

v16.14.2

Operating System

cypress/included:10.10.0

Debug Logs

No response

Other

Which debug logs would be more relevant for this? DEBUG=cypress:* has too many logs, so the GitLab job exceeds the maximum logs saved before the test finishes.

lidiagc avatar Oct 25 '22 11:10 lidiagc

Hi @lidiagc 👋 , thanks for logging this issue.

Does it seem that the test is jumping up and down the commands as it was running two instances of the test?

Yes, it does look like that though I can't think of anything that would cause this type of behavior.

Does this happen on all browsers or just Chrome? If you run just a single spec does it happen? Does it also happen in Cypress 10.0.3?

mschile avatar Oct 31 '22 22:10 mschile

Hi @mschile, thank you for replying!

We have in total 4 jobs that run our end-to-end tests:

  • (1/4): runs only one spec file (the one I mentioned in the issue)
  • (2/4): runs two spec files
  • (3/4): runs two spec files
  • (4/4): runs four spec files

First, I tested with version 10.10.0 and browser Electron:

  • (1/4) The same issue happened at least once

  • (2/4) The flicker happened on the first spec and the test failed The second spec passed

  • (3/4) Similarly, the flicker happened on the first spec and the test failed The second spec passed

  • (4/4) Again, the flicker happened on the first spec and the test failed The following three spec files passed

Our tests are not prepared to run with Firefox, so they all failed but not because of the flicker.

Then, I downgraded to version 10.0.3 and ran the tests a couple of times both with Chrome and Electron and the flicker issue never appeared. I'm guessing something after that version changed so the issue happens occasionally?

It's good to know that there is a version of Cypress 10 that doesn't have this issue; however, we wouldn't want to upgrade to this version because of the relative path issue that was fixed in version 10.9.0. Our component tests have a lot of similar names, and what differentiates them is their path. Upgrading to version 10.0.3 wouldn't be viable.

Let me know if there are other scenarios you want me to test!

lidiagc avatar Nov 03 '22 14:11 lidiagc

Thanks for the update @lidiagc! I would love to narrow in on which Cypress version caused the regression. My initial guess is Cypress 10.8.0 may have broke it. Would you be able to try 10.7.0 and 10.8.0?

mschile avatar Nov 03 '22 23:11 mschile

I tried versions 10.7.0 and 10.8.0 and the flicker never happened. Then, I upgraded to version 10.9.0 and it occurred. I suppose something in that version caused the regression?

lidiagc avatar Nov 04 '22 17:11 lidiagc

@lidiagc, thanks for determining which version caused the regression. Unfortunately, I haven't been able to reproduce the flickering. I know you aren't able to provide a link to your private repository, but are you able to recreate the issue in a public one or possibly using the cypress-test-tiny project?

mschile avatar Nov 08 '22 00:11 mschile

Hi @lidiagc. We have a theory about things that might be causing video oddities. I created a 10.9 custom binary with some code tweaks to test out that theory. Can you try this binary to see if it works?

npm install https://cdn.cypress.io/beta/npm/10.9.1/linux-x64/10.9.0-minus-video-refactor-c2135e7e0e6b269a755e7f4309f90630af81d3b9/cypress.tgz

ryanthemanuel avatar Nov 17 '22 16:11 ryanthemanuel

Hi @ryanthemanuel, thank you for looking into this. I installed the custom binary and ran the end-to-end tests on GitLab. The flicker still happens and it seems way more frequent than before.

We have recently upgraded to version 11.0.1 and we have encountered the flicker a couple of times, but it seemed way less frequent than in version 10, hence our decision to upgrade.

lidiagc avatar Nov 18 '22 14:11 lidiagc

@lidiagc, given that you're seeing this less frequently on 11.0.1 would you be ok with us closing this issue?

mjhenkes avatar Nov 23 '22 14:11 mjhenkes

Hi @mjhenkes I've been struggling to debug this issue for the last week or two. Based on the cypress debug logs we can see that cypress launches one version of chrome, but then fails to successfully connect to it. It then launches another version very quickly after that and then does successfully launch. Both instances of chrome are running in parallel and executing all the same tests. This causes duplicate logs, duplicate commands, and multiple dom snapshots which all get compiled together into a video which appears to be flickering but it's really just oscillating between the different browsers running the same tests. This still occurs on version of Cypress up to 12.2.0 and started around version 10.10.0. It doesn't occur for us on version 10.0.0. We have been able to reduce the occurrences of this issue by increasing CYPRESS_INTERNAL_BROWSER_CONNECT_TIMEOUT to a very high value so that it eventually connects to chrome. Do we know why the newer version of Cypress fails to connect within the timeout period? And when it fails to connect, how come it isn't aware an instance of Chrome is running the tests the background while it launches another one?

chasemgray avatar Jan 02 '23 02:01 chasemgray

Hi @chasemgray , can you please open a new issue with your specific problem and a reproducible example?

nagash77 avatar Jan 03 '23 14:01 nagash77

Hi, just wanted to update here that we are still experiencing what @chasemgray described in Cypress 12.1.0. We occasionally get failed tests in the CI with the video showing the same flickering behavior first described in https://github.com/cypress-io/cypress/issues/24377#issue-1422284433.

lidiagc avatar Jan 03 '23 14:01 lidiagc

We have an open internal ticket with cypress support (10338) just in case the team wants to reference it there.

chasemgray avatar Jan 03 '23 18:01 chasemgray

  • The code that is failing to detect or terminate a browser is here: https://github.com/cypress-io/cypress/blob/56bebb109e011d644d91237f070191058249d2e5/packages/server/lib/modes/run.ts#L486-L487

This issue also appears to be related to this problem https://github.com/cypress-io/cypress/issues/22825 Multiple Chrome browsers are causing duplicate calls to .next() in cypress middleware which causes issues.

Looking back through our logs it appears this issue has existed for a very long time in Cypress if Cypress fails to launch or terminate a browser. We have old runs that I looked back at and found the same symptoms. It got much worse on some minor version of 10.X.X.

Here is a screenshot of where it fails to detect that chrome launched

Screen Shot 2023-01-03 at 12 05 38 PM

chasemgray avatar Jan 03 '23 19:01 chasemgray

One big change I can see here is the change from Bluebird.join to Bluebird.all https://github.com/cypress-io/cypress/commit/3c2fea216bc3ccb448ffa28a68d88e40ab7e1d82#diff-23312a21d720a74c51c81d569ca48200ab80885b2f4415df78e39cea5685788fL551

Bluebird.join last argument is supposed to be a function to call, and Bluebird.all is expecting an array of promises. It seems like this would significantly affect waiting properly for browser launch if this wasn't an intended change (which I'm assuming since the pull request was converting code to typescript)http://bluebirdjs.com/docs/api/promise.join.html http://bluebirdjs.com/docs/api/promise.all.html

Maybe I'm completely off. I was just looking at the code history to see what could have changed this to make it so much worse.

chasemgray avatar Jan 03 '23 19:01 chasemgray

@chrisbreiding @nagash77 any update on the issue?

chasemgray avatar Jan 04 '23 23:01 chasemgray

Hi @chasemgray , no update just yet. @chrisbreiding is hard at work diving in.

nagash77 avatar Jan 05 '23 15:01 nagash77

@chasemgray Thanks for investigating this and I think you may be onto something, but unfortunately I don't think that exact change is what's causing the issue. The change from Bluebird.join to Bluebird.all in that commit is functionally equivalent.

From the Bluebird.join doc:

This behavior has been deprecated but is still supported partially - when the last argument is an immediate function value the new semantics will apply

Bluebird checks if the last argument is a function (as opposed to a promise) and uses the documented behavior. In the case of the code in question, both functions return a promise, so Bluebird uses the old semantics.

I do think it's the originally intended behavior that we race the two promises of connecting to the socket and launching the browser before moving on, but that's not to say there isn't a bug or race condition there. The browser launching process is fairly complex, so there's a lot of potential for something to go wrong. I think it's a good area to poke at to determine the cause for this issue, so I'm digging into it more and will hopefully come up with the root of the problem.

chrisbreiding avatar Jan 06 '23 17:01 chrisbreiding

@chrisbreiding It seems like there are two issues:

  • Increased frequency of an unconnected chrome instance or one that fails to be terminated when it doesn't launch in time.
  • Cypress runner allows messages to come from multiple browsers. I would think that it would be able to detect, either via port or something else, if the messages come from the browser it believes launched successfully. Though this wouldn't prevent the rogue browser instance from causing side effects, etc.

chasemgray avatar Jan 08 '23 17:01 chasemgray

@chasemgray @lidiagc do we have a definite way to reproduce this issue or get into this state where we can see the flickering?

AtofStryker avatar Jan 23 '23 21:01 AtofStryker

Significantly slow down the chrome launch seems to be the most common cause we see. Otherwise it might launch in time for the cypress logic to detect it.

Isn't there a way for cypress to just fail when it detects messages coming in on multiple ports?

chasemgray avatar Jan 26 '23 02:01 chasemgray

Also, if you want to look at this over zoom I can walk you through a lot of the debug details that showed us this was due to chrome launching twice.

chasemgray avatar Jan 26 '23 02:01 chasemgray

@lidiagc and @chasemgray, I was able to reproduce the video flicker locally with cypress run by lowering the CYPRESS_INTERNAL_BROWSER_CONNECT_TIMEOUT to 500 ms. With this reproduction, we'll be able to investigate the root cause and come up with a solution. Thank you for your continued patience on this issue.

https://user-images.githubusercontent.com/2002044/216637938-8e8a5eb0-d6c8-42fc-91b3-b786179abcd8.mp4

Isn't there a way for cypress to just fail when it detects messages coming in on multiple ports?

Yes, that is probably possible though my guess is the root cause is higher up and we'll want to figure out why the browser is not closing as expected in the first place.

mschile avatar Feb 03 '23 15:02 mschile

https://github.com/cypress-io/cypress/pull/25898 should fix this issue by preventing more than one browser from being connected at a time. It will be out with the next release, but if you'd like to check it out ahead of time, I'd recommend trying out the prerelease build for the latest commit on the develop branch.

Thanks again, @chasemgray, for digging into this. Recognizing that multiple browsers were being connected helped pinpoint the root cause of the issue.

chrisbreiding avatar Feb 24 '23 15:02 chrisbreiding

Released in 12.7.0.

This comment thread has been locked. If you are still experiencing this issue after upgrading to Cypress v12.7.0, please open a new issue.

cypress-bot[bot] avatar Feb 25 '23 02:02 cypress-bot[bot]