Inexplicable and flaky userEvent action timeouts
Describe the bug
Since the version 3.1.2, some tests became flaky: userEvent (from @vitest/browser/context) actions lead to unexpected timeout errors.
These tests used to work in 3.1.1, the call logs do not reveal any issues, the screenshots look good, the actions should work, it's really inexplicable why these timeout errors start happening.
I cannot reproduce these errors at all locally, it's only happening in our CI, no clue why.
Here's a couple examples below:
TimeoutError: locator.hover: Timeout 581ms exceeded.
Call log:
- waiting for locator('[data-vitest="true"]').contentFrame().locator('body')
- locator resolved to <body>…</body>
- attempting hover action
- waiting for element to be visible and stable
- element is visible and stable
- scrolling into view if needed
- done scrolling
It says "Timeout 581ms exceeded", but it's not like we set an explicit 581ms timeout limit, really odd.
TimeoutError: frame.dragAndDrop: Timeout 10000ms exceeded.
Call log:
- waiting for locator('div:nth-child(4) > .dragIconClassname_d1f2l6jt')
Here setting a 10s timeout limit, and it's still exceeded, even though it should pass/not timeout.
TimeoutError: locator.hover: Timeout 5000ms exceeded.
Call log:
- waiting for locator('[data-vitest="true"]').contentFrame().locator('body')
- locator resolved to <body>↵ ↵↵</body>
- attempting hover action
- waiting for element to be visible and stable
- element is visible and stable
- scrolling into view if needed
- done scrolling
TimeoutError: locator.hover: Timeout 5000ms exceeded.
Call log:
- waiting for locator('[data-vitest="true"]').contentFrame().locator('body')
- locator resolved to <body>…</body>
- attempting hover action
- waiting for element to be visible and stable
Reproduction
A simple
await userEvent.hover(document.body, { position: { x: 0, y: 0 } });
can timeout in our CI since 3.1.2.
This is with the playwright provider.
System Info
System:
OS: Windows 11 10.0.26100
CPU: (32) x64 AMD Ryzen 9 9950X3D 16-Core Processor
Memory: 36.50 GB / 63.58 GB
Binaries:
Node: 23.11.0 - C:\Program Files\nodejs\node.EXE
npm: 11.3.0 - C:\Program Files\nodejs\npm.CMD
Browsers:
Edge: Chromium (131.0.2903.99)
npmPackages:
@vitejs/plugin-react: ^4.4.1 => 4.4.1
@vitest/browser: ^3.1.2 => 3.1.2
@vitest/coverage-v8: ^3.1.2 => 3.1.2
playwright: ~1.52.0 => 1.52.0
vite: ^6.3.2 => 6.3.2
vitest: ^3.1.2 => 3.1.2
vitest-browser-react: ^0.1.1 => 0.1.1
Used Package Manager
npm
Validations
- [x] Follow our Code of Conduct
- [x] Read the Contributing Guidelines.
- [x] Read the docs.
- [x] Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
- [x] Check that this is a concrete bug. For Q&A open a GitHub Discussion or join our Discord Chat Server.
- [x] The provided reproduction is a minimal reproducible example of the bug.
It says "Timeout 581ms exceeded", but it's not like we set an explicit 581ms timeout limit, really odd.
The actions timeouts are now affected by the test timeout. Previously they would carry over to other tests if the test timed out.
The issue as a whole might be related to scaling: #6512
The issue as a whole might be related to scaling
I don't think so, for example: hovering over document.body should never time out. 🤷♂
I had seen flaky tests when running experimental builds of the parallelization PR, so it's most likely related to this work instead.
Can actions in a test suite/tab affect other tabs?
The issue as a whole might be related to scaling
I don't think so, for example: hovering over
document.bodyshould never time out. 🤷♂ I had seen flaky tests when running experimental builds of the parallelization PR, so it's most likely related to this work instead. Can actions in a test suite/tab affect other tabs?
It it was related to parallelization, these tests would have been flaky even before that, we were always running tests in different tabs. But scaling was disabled in CI before 3.1.2, for example. Now it's always scaled.
I don't think so, for example: hovering over
document.bodyshould never time out. 🤷♂
By the way, does it time out if it's just a single test file? Or do you have other tests running?
The actions timeouts are now affected by the test timeout.
I wish it was clearer, right now it sounds like userEvent actions are timing out instead of the test itself.
But scaling was disabled in CI before 3.1.2, for example.
Even in the PR builds? https://github.com/vitest-dev/vitest/pull/7665#issuecomment-2721739132
We set the same viewport in both browser.viewport and browser.instances[0].context.viewport, I don't think anything's cut off. Disabling headless and ui locally looks correct at least.
Since the tests are flaky, I don't think it's a scaling issue, otherwise they would consistently fail.
By the way, does it time out if it's just a single test file?
Ran a build in CI with maxWorkers: 1 and all the tests passed, no issues whatsoever.
Or do you have other tests running?
It's just vitest running, though not 100% sure if the CI machine is busy doing anything else.
maxWorkers |
testTimeout |
pass / fail |
|---|---|---|
| 1 | Pass ✅ | |
| 2 | Pass ✅ | |
| 4 | Pass ✅ | |
| 8 | Pass ✅ | |
| 16 | Fail ⛔ | |
| 16 | 20_000 | Fail ⛔ |
| 16 | 60_000 | Pass ✅ |
Seems like we can just increase testTimeout.
As I understand it, it's the test suite that times out, rather than individual tests.
Maybe a separate testSuiteTimeout setting, and clearer error messages (Test timed out -> Test suite timed out) would help with understanding timeout errors.
Feel free to close the issue, though it'd still be nice to clarify where/why timeouts happen, as right not it can look like there's an issue with userEvent.
Looking at your code, it just seems that timeouts happen because the CI is too busy.
As I understand it, it's the test suite that times out, rather than individual tests.
I don't think this is the case since userEvent is the one that times out in all your code examples. userEvent is now affected by the test timeout, but it doesn't matter in the end - previously your test timeout would happen earlier and you wouldn't see the call log and a stack trace from playwright, right now you do (so if they were not related, you would see a test timeout in all your examples instead, which is more confusing).
Maybe your tests passed previously because the parallelization stopped sooner and CI was not that busy. How it worked before: vitest split all tests into chunks and opened maxWorkers tabs at the same time. Then these tabs would run tests assigned to them independently. When the tab finished running tests, it would stay idle. Now, vitest opens maxWorkers tabs at the same time and always runs tests in all of them at all time - when the tab is not busy, it will start running a new test (until there are less tests than there are workers - maybe we should also close the tab in this case 🤔 )
Maybe a separate testSuiteTimeout setting, and clearer error messages (Test timed out -> Test suite timed out) would help with understanding timeout errors.
I don't know what you mean, there is no such thing as a suite timeout.
Seeing similar issues in Vitest Ecosystem CI: https://github.com/vitest-dev/vitest-ecosystem-ci/actions/runs/15104907077/job/42451860521 with https://github.com/stackblitz/webcontainer-test. Timeouts start to happen with 3.1.2 and later versions. 3.1.1 works all the time.
Need to check commit-by-commit to bisect between these two:
- https://github.com/vitest-dev/vitest/pull/6512
- https://github.com/vitest-dev/vitest/pull/7665
@nstepien do you have any vi.useFakeTimers() or vi.mock() calls that are not restored in the same test file? Try adding additional restores for those and see if it fixes the flakiness.
import { onTestFinished, test, vi } from "vitest";
test("some test", async () => {
onTestFinished(() => {
vi.restoreAllMocks();
vi.useRealTimers();
});
vi.mock(...);
vi.useFakeTimers();
We don't use vi.mock() at all, and only one test suite uses vi.useFakeTimers().
We already call vi.useRealTimers() in a global afterEach.
I don't think those are related.
That helped in the case of https://github.com/vitest-dev/vitest/issues/7871#issuecomment-2889689649 - the new parallelization improvements revealed a bug in the test cases.
So now we no longer have a reproduction for this specific issue.
We are also starting to get inexplicable flakiness in our unit tests.
We always get TypeError: [Function ...] is not a spy or a call to a spy! when they are properly set as spies.
This is one of the test file that failed at this test :
TypeError: [Function createBaseStore] is not a spy or a call to a spy!
Failure screenshot:
- src/components/commerce/atomic-commerce-recommendation-interface/__screenshots__/store.spec.ts/-createCommerceRecommendationStore-should-call--createBaseStore-with-the-correct-arguments-1.png
❯ src/components/commerce/atomic-commerce-recommendation-interface/store.spec.ts:23:28
21| it('should call #createBaseStore with the correct arguments', () => {
22| expect(store).toBeDefined();
23| expect(createBaseStore).toHaveBeenCalledExactlyOnceWith({
| ^
24| loadingFlags: [],
25| iconAssetsPath: '',
This is the specific test file : https://github.com/coveo/ui-kit/blob/master/packages/atomic/src/components/commerce/atomic-commerce-recommendation-interface/store.spec.ts
Are we doing something wrong with this test or is there a problematic race condition somewhere ?
After migrating approx. 1500 tests from toHaveBeenCalledWith to toHaveBeenCalledExactlyOnceWith I am running into memory leaks. I am not sure if this is related.
I am seeing something similar to this since trying to add sharding in CI. One of my shards consistently fails on a couple of tests that look very similar to many other tests we have. During the test runs I see it time out taking a screenshot of the page body somehow.
stderr | test a
[vitest] Failed to take a screenshot [TimeoutError: locator.screenshot: Timeout 3000ms exceeded.
Call log:
- waiting for frameLocator('[data-vitest="true"]').locator('body')
]
# other tests
× test a 17849ms
→ locator.click: Timeout 3000ms exceeded.
Call log:
- waiting for frameLocator('[data-vitest="true"]').getByRole('button', { name: 'Filters' })
- locator resolved to <button type="button">Filters</button>
× test b 4430ms
→ locator.click: Timeout 3000ms exceeded.
Call log:
- waiting for frameLocator('[data-vitest="true"]').getByRole('button', { name: 'Filters' })
- locator resolved to <button type="button">Filters</button>
And then at the summary
FAIL browser (chromium) test a
TimeoutError: locator.click: Timeout 3000ms exceeded.
Call log:
- waiting for frameLocator('[data-vitest="true"]').getByRole('button', { name: 'Filters' })
- locator resolved to <button type="button">Filters</button>
FAIL browser (chromium) test b
TimeoutError: locator.click: Timeout 3000ms exceeded.
Call log:
- waiting for frameLocator('[data-vitest="true"]').getByRole('button', { name: 'Filters' })
- locator resolved to <button type="button">Filters</button>
Using vitest 3.2.4
We had an actionTimeout of 3000ms in our vite config. Removing that in CI makes it pass, although I'm surprised it's consistently failing this specific pair of tests. Hopefully that's helpful to someone else