node icon indicating copy to clipboard operation
node copied to clipboard

test: deflake http-server-request-timeout test

Open nicksia-vgw opened this issue 1 year ago • 1 comments

parallel/http-server-request-timeouts-mixed test was sometimes failing due to insufficient tolerance between the connection timeout checking interval, and the expected timeout specified in the test.

The checking interval was 500ms, and the request was checked for timeout exactly 500ms after the request was expected to timeout. This led to a timing condition where the next check would occur slightly after the request was expected to timeout.

Detailed Explanation Timings headersTimeout: 2000ms requestTimeout: 4000ms connectionsCheckingInterval: 500ms

Request 2 is started at headersTimeout * 0.2, or 400ms. Headers time out 2000ms after that. Completion of Request 2 is checked at at headersTimeout * 1.2 + connectionsCheckingInterval, or 2900ms.

Timeline @[400ms]: Request 2 client created and first write sent. @[2400ms]: Headers are timed out. @[2900ms]: Request 2 completion is checked (which is intended to depend on Request 2 expiring via headersTimeout, as seen in the comment - assert(request2.response.startsWith(responseTimeout)); // It is expired due to headersTimeout.

The problem is that the connectionsCheckingInterval is slightly too high. The connection checking interval could tick just after 2400ms, and then just after 2900ms.

Example

@[2424ms] request2.completed: false
checking connection timeout
@[2525ms] request2.completed: false
@[2625ms] request2.completed: false
@[2727ms] request2.completed: false
@[2828ms] request2.completed: false
node:assert:400
    throw err;
    ^

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(request2.completed)

    at Timeout._onTimeout (/Users/nicksia/Documents/git/node/test/parallel/test-http-server-request-timeouts-mixed.js:108:5)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7) {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

The fix is to ensure connection checking interval is less than 500ms, and to decouple the check timeout from connection checking interval.

nicksia-vgw avatar Aug 08 '22 10:08 nicksia-vgw

CI: https://ci.nodejs.org/job/node-test-pull-request/45934/

nodejs-github-bot avatar Aug 08 '22 19:08 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/45981/

nodejs-github-bot avatar Aug 10 '22 18:08 nodejs-github-bot

Superseds https://github.com/nodejs/node/pull/43470.

lpinca avatar Aug 10 '22 19:08 lpinca

CI: https://ci.nodejs.org/job/node-test-pull-request/46003/

nodejs-github-bot avatar Aug 11 '22 15:08 nodejs-github-bot

Could I get a hand getting this merged please?

nicksia-vgw avatar Aug 14 '22 08:08 nicksia-vgw

@nicksia-vgw I see this has still failed on osx: https://ci.nodejs.org/job/node-test-commit-osx/46808/ are we ok with that?

MoLow avatar Aug 14 '22 08:08 MoLow

Oh sorry - I missed that. Let me take a look.

nicksia-vgw avatar Aug 14 '22 08:08 nicksia-vgw

I think for timing sensitive tests like these, it might be more reliable when run as part of the sequential suite, to avoid issues caused by delays due to resource contention.

Any concerns with merging this PR (it will reduce failure rate), then moving these failing tests to the sequential suite in a follow-up PR?

  • parallel_test_http_server_request_timeouts_mixed
  • parallel_test_http_server_headers_timeout_keepalive

itsnicksia avatar Aug 14 '22 10:08 itsnicksia

OK - I'm ready to merge.

itsnicksia avatar Aug 14 '22 12:08 itsnicksia

Landed in b84a6338b477.

lpinca avatar Aug 15 '22 18:08 lpinca

This depends on https://github.com/nodejs/node/pull/42893, which is marked as "dont-land-on-v16.x".

juanarbol avatar Sep 30 '22 03:09 juanarbol