playwright icon indicating copy to clipboard operation
playwright copied to clipboard

feat: implement flag to fail flaky tests

Open Joe-Hendley opened this issue 9 months ago • 8 comments

Implements feature requested in https://github.com/microsoft/playwright/issues/30457

The test runner treats flaky tests as failures when the flag is enabled, but still reports flaky tests as flaky in the reporting interface. It feels like something worth discussing as this behaviour makes sense to me, but looked a bit odd to @BJSS-russell-pollock when I ran this past him.

Joe-Hendley avatar May 01 '24 09:05 Joe-Hendley

@microsoft-github-policy-service agree company="BJSS"

Joe-Hendley avatar May 01 '24 09:05 Joe-Hendley

Test results for "tests 1"

2 flaky :warning: [playwright-test] › ui-mode-test-watch.spec.ts:145:5 › should watch all
:warning: [webkit-library] › library/browsercontext-clearcookies.spec.ts:116:3 › should remove cookies by path

27252 passed, 671 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:

Merge workflow run.

github-actions[bot] avatar May 01 '24 09:05 github-actions[bot]

Talking point of the failing on flaky behaviour, is about the reporter not indicating that the suite failed because of the flaky test. The exit code is 1 but the reporter doesn't look like it.

To try this out, I set the the playwright.config to:

retries: process.env.CI ? 2 : 5,
failOnFlakyTests: true,

And then the test:

test('flaky test', async () => {

  const randomFail = Math.round(Math.random());
  expect(randomFail).toBe(0)
});

When running the test through the CLI - the retry would pass and then mark the test as flaky: retry result flaky

BJSS-russell-pollock avatar May 01 '24 10:05 BJSS-russell-pollock

Hi there,

The intended behaviour from this PR, and my understanding of the issue matches yours - any changes in the code extra to that reflect my unfamiliarity with the codebase.

I think Russell's comment / the discussion around behaviour reflects the user / manual tester perspective where the change in exit code isn't immediately visible.

Joe-Hendley avatar May 01 '24 20:05 Joe-Hendley

The intended behaviour from this PR, and my understanding of the issue matches yours - any changes in the code extra to that reflect my unfamiliarity with the codebase.

Sounds great! In this case, I think we should plumb the new CLI option similar to --pass-with-no-tests. Search for passWithNoTests across the codebase for the reference. And this is a place that determines whether everything was successful or something has failed, and it needs to be tweaked.

@BJSS-russell-pollock I think this should be clear since reporter shows the "Flaky" count as non-zero. I'd assume anyone using this option will know that flakiness means "failed test run", because the behavior was opt in. Let me know what you think.

dgozman avatar May 02 '24 15:05 dgozman

@BJSS-russell-pollock I think this should be clear since reporter shows the "Flaky" count as non-zero. I'd assume anyone using this option will know that flakiness means "failed test run", because the behavior was opt in. Let me know what you think.

Absolutely makes sense.

BJSS-russell-pollock avatar May 02 '24 15:05 BJSS-russell-pollock

Sounds great! In this case, I think we should plumb the new CLI option similar to --pass-with-no-tests. Search for passWithNoTests across the codebase for the reference. And this is a place that determines whether everything was successful or something has failed, and it needs to be tweaked.

Excellent - I'll have a look and update it accordingly.

Joe-Hendley avatar May 02 '24 16:05 Joe-Hendley

Test results for "tests 1"

1 flaky :warning: [webkit-page] › page/workers.spec.ts:243:3 › should support offline

27321 passed, 661 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:

Merge workflow run.

github-actions[bot] avatar May 08 '24 09:05 github-actions[bot]

Test results for "tests 1"

693 passed :heavy_check_mark::heavy_check_mark::heavy_check_mark:

Merge workflow run.

github-actions[bot] avatar May 09 '24 11:05 github-actions[bot]

Test results for "tests 1"

27328 passed, 661 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:

Merge workflow run.

github-actions[bot] avatar May 09 '24 11:05 github-actions[bot]

Test results for "tests 1"

15 passed :heavy_check_mark::heavy_check_mark::heavy_check_mark:

Merge workflow run.

github-actions[bot] avatar May 09 '24 13:05 github-actions[bot]

Okay, reworked as suggested - much less additional code!

Joe-Hendley avatar May 09 '24 13:05 Joe-Hendley

Test results for "tests 1"

3 failed :x: [playwright-test] › playwright.ct-react.spec.ts:253:5 › should pass "key" attribute from JSX in variable :x: [installation tests] › playwright-electron-should-work.spec.ts:18:5 › electron should work :x: [installation tests] › playwright-test-package-managers.spec.ts:54:5 › npm: uninstalling ct removes playwright bin

27325 passed, 661 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:

Merge workflow run.

github-actions[bot] avatar May 09 '24 14:05 github-actions[bot]

Test results for "tests 1"

1 failed :x: [playwright-test] › runner.spec.ts:118:5 › should ignore subprocess creation error because of SIGINT

2 flaky :warning: [playwright-test] › ui-mode-test-watch.spec.ts:184:5 › should watch new file
:warning: [webkit-library] › library/browsercontext-clearcookies.spec.ts:52:3 › should remove cookies by name

27330 passed, 662 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:

Merge workflow run.

github-actions[bot] avatar May 10 '24 10:05 github-actions[bot]

Test results for "tests 1"

2 flaky :warning: [chromium-library] › library/trace-viewer.spec.ts:908:1 › should display waitForLoadState even if did not wait for it
:warning: [playwright-test] › ui-mode-test-watch.spec.ts:96:5 › should batch watch updates

27331 passed, 662 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:

Merge workflow run.

github-actions[bot] avatar May 13 '24 09:05 github-actions[bot]

@dgozman This should be ok for re-review when you're ready - the failed tests in previous pipeline runs look like noise to me, but happy to dig if you think it's deeper

Joe-Hendley avatar May 15 '24 08:05 Joe-Hendley

Test results for "tests 1"

1 failed :x: [playwright-test] › playwright.ct-react.spec.ts:253:5 › should pass "key" attribute from JSX in variable

2 flaky :warning: [playwright-test] › ui-mode-test-screencast.spec.ts:21:5 › should show screenshots
:warning: [playwright-test] › ui-mode-test-watch.spec.ts:96:5 › should batch watch updates

27346 passed, 662 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:

Merge workflow run.

github-actions[bot] avatar May 15 '24 09:05 github-actions[bot]