vitest icon indicating copy to clipboard operation
vitest copied to clipboard

Inexplicable and flaky userEvent action timeouts

Open nstepien opened this issue 8 months ago • 11 comments

Describe the bug

Since the version 3.1.2, some tests became flaky: userEvent (from @vitest/browser/context) actions lead to unexpected timeout errors. These tests used to work in 3.1.1, the call logs do not reveal any issues, the screenshots look good, the actions should work, it's really inexplicable why these timeout errors start happening. I cannot reproduce these errors at all locally, it's only happening in our CI, no clue why.

Here's a couple examples below:

TimeoutError: locator.hover: Timeout 581ms exceeded.
Call log:
  - waiting for locator('[data-vitest="true"]').contentFrame().locator('body')
    - locator resolved to <body>…</body>
  - attempting hover action
    - waiting for element to be visible and stable
    - element is visible and stable
    - scrolling into view if needed
    - done scrolling

It says "Timeout 581ms exceeded", but it's not like we set an explicit 581ms timeout limit, really odd.

TimeoutError: frame.dragAndDrop: Timeout 10000ms exceeded.
Call log:
  - waiting for locator('div:nth-child(4) > .dragIconClassname_d1f2l6jt')

Here setting a 10s timeout limit, and it's still exceeded, even though it should pass/not timeout.

TimeoutError: locator.hover: Timeout 5000ms exceeded.
Call log:
  - waiting for locator('[data-vitest="true"]').contentFrame().locator('body')
    - locator resolved to <body>↵  ↵↵</body>
  - attempting hover action
    - waiting for element to be visible and stable
    - element is visible and stable
    - scrolling into view if needed
    - done scrolling
TimeoutError: locator.hover: Timeout 5000ms exceeded.
Call log:
  - waiting for locator('[data-vitest="true"]').contentFrame().locator('body')
    - locator resolved to <body>…</body>
  - attempting hover action
    - waiting for element to be visible and stable

Reproduction

A simple

await userEvent.hover(document.body, { position: { x: 0, y: 0 } });

can timeout in our CI since 3.1.2.

This is with the playwright provider.

System Info

System:
    OS: Windows 11 10.0.26100
    CPU: (32) x64 AMD Ryzen 9 9950X3D 16-Core Processor
    Memory: 36.50 GB / 63.58 GB
  Binaries:
    Node: 23.11.0 - C:\Program Files\nodejs\node.EXE
    npm: 11.3.0 - C:\Program Files\nodejs\npm.CMD
  Browsers:
    Edge: Chromium (131.0.2903.99)
  npmPackages:
    @vitejs/plugin-react: ^4.4.1 => 4.4.1
    @vitest/browser: ^3.1.2 => 3.1.2
    @vitest/coverage-v8: ^3.1.2 => 3.1.2
    playwright: ~1.52.0 => 1.52.0
    vite: ^6.3.2 => 6.3.2
    vitest: ^3.1.2 => 3.1.2
    vitest-browser-react: ^0.1.1 => 0.1.1

Used Package Manager

npm

Validations

nstepien avatar Apr 22 '25 15:04 nstepien

It says "Timeout 581ms exceeded", but it's not like we set an explicit 581ms timeout limit, really odd.

The actions timeouts are now affected by the test timeout. Previously they would carry over to other tests if the test timed out.

The issue as a whole might be related to scaling: #6512

sheremet-va avatar Apr 24 '25 15:04 sheremet-va

The issue as a whole might be related to scaling

I don't think so, for example: hovering over document.body should never time out. 🤷‍♂ I had seen flaky tests when running experimental builds of the parallelization PR, so it's most likely related to this work instead. Can actions in a test suite/tab affect other tabs?

nstepien avatar Apr 24 '25 15:04 nstepien

The issue as a whole might be related to scaling

I don't think so, for example: hovering over document.body should never time out. 🤷‍♂ I had seen flaky tests when running experimental builds of the parallelization PR, so it's most likely related to this work instead. Can actions in a test suite/tab affect other tabs?

It it was related to parallelization, these tests would have been flaky even before that, we were always running tests in different tabs. But scaling was disabled in CI before 3.1.2, for example. Now it's always scaled.

sheremet-va avatar Apr 24 '25 15:04 sheremet-va

I don't think so, for example: hovering over document.body should never time out. 🤷‍♂

By the way, does it time out if it's just a single test file? Or do you have other tests running?

sheremet-va avatar Apr 24 '25 16:04 sheremet-va

The actions timeouts are now affected by the test timeout.

I wish it was clearer, right now it sounds like userEvent actions are timing out instead of the test itself.

But scaling was disabled in CI before 3.1.2, for example.

Even in the PR builds? https://github.com/vitest-dev/vitest/pull/7665#issuecomment-2721739132 We set the same viewport in both browser.viewport and browser.instances[0].context.viewport, I don't think anything's cut off. Disabling headless and ui locally looks correct at least. Since the tests are flaky, I don't think it's a scaling issue, otherwise they would consistently fail.

By the way, does it time out if it's just a single test file?

Ran a build in CI with maxWorkers: 1 and all the tests passed, no issues whatsoever.

Or do you have other tests running?

It's just vitest running, though not 100% sure if the CI machine is busy doing anything else.

nstepien avatar Apr 24 '25 18:04 nstepien

maxWorkers testTimeout pass / fail
1 Pass ✅
2 Pass ✅
4 Pass ✅
8 Pass ✅
16 Fail ⛔
16 20_000 Fail ⛔
16 60_000 Pass ✅

Seems like we can just increase testTimeout. As I understand it, it's the test suite that times out, rather than individual tests. Maybe a separate testSuiteTimeout setting, and clearer error messages (Test timed out -> Test suite timed out) would help with understanding timeout errors.

Feel free to close the issue, though it'd still be nice to clarify where/why timeouts happen, as right not it can look like there's an issue with userEvent.

nstepien avatar Apr 24 '25 19:04 nstepien

Looking at your code, it just seems that timeouts happen because the CI is too busy.

As I understand it, it's the test suite that times out, rather than individual tests.

I don't think this is the case since userEvent is the one that times out in all your code examples. userEvent is now affected by the test timeout, but it doesn't matter in the end - previously your test timeout would happen earlier and you wouldn't see the call log and a stack trace from playwright, right now you do (so if they were not related, you would see a test timeout in all your examples instead, which is more confusing).

Maybe your tests passed previously because the parallelization stopped sooner and CI was not that busy. How it worked before: vitest split all tests into chunks and opened maxWorkers tabs at the same time. Then these tabs would run tests assigned to them independently. When the tab finished running tests, it would stay idle. Now, vitest opens maxWorkers tabs at the same time and always runs tests in all of them at all time - when the tab is not busy, it will start running a new test (until there are less tests than there are workers - maybe we should also close the tab in this case 🤔 )

Maybe a separate testSuiteTimeout setting, and clearer error messages (Test timed out -> Test suite timed out) would help with understanding timeout errors.

I don't know what you mean, there is no such thing as a suite timeout.

sheremet-va avatar Apr 24 '25 19:04 sheremet-va

Seeing similar issues in Vitest Ecosystem CI: https://github.com/vitest-dev/vitest-ecosystem-ci/actions/runs/15104907077/job/42451860521 with https://github.com/stackblitz/webcontainer-test. Timeouts start to happen with 3.1.2 and later versions. 3.1.1 works all the time.

Need to check commit-by-commit to bisect between these two:

  • https://github.com/vitest-dev/vitest/pull/6512
  • https://github.com/vitest-dev/vitest/pull/7665

AriPerkkio avatar May 19 '25 05:05 AriPerkkio

@nstepien do you have any vi.useFakeTimers() or vi.mock() calls that are not restored in the same test file? Try adding additional restores for those and see if it fixes the flakiness.

import { onTestFinished, test, vi } from "vitest";

test("some test", async () => {
  onTestFinished(() => {
    vi.restoreAllMocks();
    vi.useRealTimers();
  });

   vi.mock(...);
   vi.useFakeTimers();

AriPerkkio avatar Jun 04 '25 09:06 AriPerkkio

We don't use vi.mock() at all, and only one test suite uses vi.useFakeTimers(). We already call vi.useRealTimers() in a global afterEach. I don't think those are related.

nstepien avatar Jun 04 '25 10:06 nstepien

That helped in the case of https://github.com/vitest-dev/vitest/issues/7871#issuecomment-2889689649 - the new parallelization improvements revealed a bug in the test cases.

So now we no longer have a reproduction for this specific issue.

AriPerkkio avatar Jun 04 '25 16:06 AriPerkkio

We are also starting to get inexplicable flakiness in our unit tests.

We always get TypeError: [Function ...] is not a spy or a call to a spy! when they are properly set as spies.

This is one of the test file that failed at this test :

  TypeError: [Function createBaseStore] is not a spy or a call to a spy!
  
  Failure screenshot:
    - src/components/commerce/atomic-commerce-recommendation-interface/__screenshots__/store.spec.ts/-createCommerceRecommendationStore-should-call--createBaseStore-with-the-correct-arguments-1.png
  
   ❯ src/components/commerce/atomic-commerce-recommendation-interface/store.spec.ts:23:28
       21|   it('should call #createBaseStore with the correct arguments', () => {
       22|     expect(store).toBeDefined();
       23|     expect(createBaseStore).toHaveBeenCalledExactlyOnceWith({
         |                            ^
       24|       loadingFlags: [],
       25|       iconAssetsPath: '',

This is the specific test file : https://github.com/coveo/ui-kit/blob/master/packages/atomic/src/components/commerce/atomic-commerce-recommendation-interface/store.spec.ts

Are we doing something wrong with this test or is there a problematic race condition somewhere ?

alexprudhomme avatar Jul 23 '25 21:07 alexprudhomme

After migrating approx. 1500 tests from toHaveBeenCalledWith to toHaveBeenCalledExactlyOnceWith I am running into memory leaks. I am not sure if this is related.

HaNdTriX avatar Oct 08 '25 15:10 HaNdTriX

I am seeing something similar to this since trying to add sharding in CI. One of my shards consistently fails on a couple of tests that look very similar to many other tests we have. During the test runs I see it time out taking a screenshot of the page body somehow.

stderr | test a
[vitest] Failed to take a screenshot [TimeoutError: locator.screenshot: Timeout 3000ms exceeded.
Call log:
  - waiting for frameLocator('[data-vitest="true"]').locator('body')
]

# other tests

   × test a 17849ms
     → locator.click: Timeout 3000ms exceeded.
Call log:
  - waiting for frameLocator('[data-vitest="true"]').getByRole('button', { name: 'Filters' })
  -   locator resolved to <button type="button">Filters</button>

   × test b 4430ms
     → locator.click: Timeout 3000ms exceeded.
Call log:
  - waiting for frameLocator('[data-vitest="true"]').getByRole('button', { name: 'Filters' })
  -   locator resolved to <button type="button">Filters</button>

And then at the summary

 FAIL   browser (chromium)  test a
TimeoutError: locator.click: Timeout 3000ms exceeded.
Call log:
  - waiting for frameLocator('[data-vitest="true"]').getByRole('button', { name: 'Filters' })
  -   locator resolved to <button type="button">Filters</button>


 FAIL   browser (chromium)  test b
TimeoutError: locator.click: Timeout 3000ms exceeded.
Call log:
  - waiting for frameLocator('[data-vitest="true"]').getByRole('button', { name: 'Filters' })
  -   locator resolved to <button type="button">Filters</button>

Using vitest 3.2.4

mjewell avatar Oct 23 '25 18:10 mjewell

We had an actionTimeout of 3000ms in our vite config. Removing that in CI makes it pass, although I'm surprised it's consistently failing this specific pair of tests. Hopefully that's helpful to someone else

mjewell avatar Oct 23 '25 19:10 mjewell