node icon indicating copy to clipboard operation
node copied to clipboard

Investigate flaky test-http-server-request-timeouts-mixed

Open F3n67u opened this issue 1 year ago • 9 comments

Test

test-http-server-request-timeouts-mixed

Platform

macos

Console output

not ok 1173 parallel/test-http-server-request-timeouts-mixed
  ---
  duration_ms: 3.244
  severity: fail
  exitcode: 1
  stack: |-
    node:assert:399
        throw err;
        ^
    
    AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
    
      assert(request2.completed)
    
        at Timeout._onTimeout (/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/test/parallel/test-http-server-request-timeouts-mixed.js:106:5)
        at listOnTimeout (node:internal/timers:564:17)
        at process.processTimers (node:internal/timers:507:7) {
      generatedMessage: true,
      code: 'ERR_ASSERTION',
      actual: false,
      expected: true,
      operator: '=='
    }
    
    Node.js v19.0.0-pre
  ...

Build links

Failed PR

10 (nodejs/node#43329, nodejs/node#43380, nodejs/node#43366, nodejs/node#43455, nodejs/node#43190, nodejs/node#43176, nodejs/node#43374, nodejs/node#43363, nodejs/node#43417, nodejs/node#43216)

Appeared

test-orka-macos11-x64-1, test-nearform-macos10.15-x64-3, test-orka-macos10.15-x64-2, test-nearform-macos10.15-x64-1, test-orka-macos11-x64-2

First CI

https://ci.nodejs.org/job/node-test-pull-request/44587/

Last CI

https://ci.nodejs.org/job/node-test-pull-request/44664/

Example

https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx11-x64/45608/console

Additional information

Build links info is from CI Reliability 2022-06-18

This test was introduced on 2022-05-03: https://github.com/nodejs/node/pull/42893

This test starts flaky from 2022-05-04: https://github.com/nodejs/reliability/issues/271

Query all flaky dates: https://github.com/nodejs/reliability/issues?q=is%3Aissue+is%3Aopen+test-http-server-request-timeouts-mixed

cc @ShogunPanda(Test author) @lpinca(Reviewer, I saw you mention the flaky possibility in pr comment)

F3n67u avatar Jun 18 '22 02:06 F3n67u

It seems this only occurs on macOS but I am not able to reproduce locally on macOS 12.4.

lpinca avatar Jun 18 '22 05:06 lpinca

@ShogunPanda could d you take a look?

mcollina avatar Jun 18 '22 10:06 mcollina

Yup, I will take a look on Monday!

ShogunPanda avatar Jun 18 '22 10:06 ShogunPanda

It seems this only occurs on macOS but I am not able to reproduce locally on macOS 12.4.

Yes, I also tried to reproduce on my local MacOS 12.4 but never reproduced.

F3n67u avatar Jun 18 '22 10:06 F3n67u

I was able to repro on Mac 12.4:

=== release test-http-server-request-timeouts-mixed ===                       
Path: parallel/test-http-server-request-timeouts-mixed
node:assert:400
    throw err;
    ^

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(request2.completed)

    at Timeout._onTimeout (/Users/nicksia/Documents/git/node/test/parallel/test-http-server-request-timeouts-mixed.js:107:5)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7) {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

Let me know if there's anything I can do to help, even if it's just being a lab rat :)

nicksia-vgw avatar Aug 05 '22 10:08 nicksia-vgw

Thanks sir! Do you mind sharing which machine (hardware I mean) were you running on and all the relevant environment information (like if you were on battery or if your machine was overloaded and so forth)

ShogunPanda avatar Aug 05 '22 11:08 ShogunPanda

Sure thing - here's a screenshot of my system details: image

My machine was plugged into the wall.

Test was running in CLion's integrated terminal.

I'd say "low-medium" utilisation, had Postgres running in a Docker image and a IDE/Browser open (about 20% CPU usage and 20GB memory used).

Update: Fails even with request2 isolated.

nicksia-vgw avatar Aug 05 '22 11:08 nicksia-vgw

Here's some awfully crude dumps (sorry!), printing out the value of request2.completed every 100ms with setInterval.

Failed Run:

@[101ms] request2.completed: undefined
@[201ms] request2.completed: undefined
@[303ms] request2.completed: undefined
request2 client created.
request2.client.write()
@[413ms] request2.completed: false
<snip>
@[2840ms] request2.completed: false
node:assert:400
    throw err;
    ^

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(request2.completed)

    at Timeout._onTimeout (/Users/nicksia/Documents/git/node/test/parallel/test-http-server-request-timeouts-mixed.js:81:5)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7) {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

Node.js v19.0.0-pre

Passing Run

@[102ms] request2.completed: undefined
@[203ms] request2.completed: undefined
@[304ms] request2.completed: undefined
request2 client created.
request2.client.write()
@[412ms] request2.completed: false
<snip>
@[2439ms] request2.completed: false
@[2540ms] request2.completed: true
<snip>
@[6481ms] request2.completed: true
Closing server.

My passing runs are close to 2500ms after client is created, when the headersTimeout * 1.2 + connectionsCheckingInterval on my platform is 2900ms.

Could it be that the check is just too early on slow/burdened systems?

Update: I'm getting a bunch of runs around the 3000ms mark when I remove assert(request2.completed):

@[3039ms] request2.completed: true

nicksia-vgw avatar Aug 05 '22 12:08 nicksia-vgw

The times don't quite match up, which is causing this issue.

headersTimeout: 2000ms requestTimeout: 4000ms connectionsCheckingInterval: 500ms

Request 2 is started at headersTimeout * 0.2, or 400ms. Headers time out 2000ms after that. Completion of Request 2 is checked at at headersTimeout * 1.2 + connectionsCheckingInterval, or 2900ms.

Timeline @[400ms]: Request 2 client created and first write sent. @[2400ms]: Headers are timed out. @[2900ms]: Request 2 completion is checked (which is intended to depend on Request 2 expiring via headersTimeout, as seen in the comment - assert(request2.response.startsWith(responseTimeout)); // It is expired due to headersTimeout.

The problem is that the connectionsCheckingInterval is slightly too high. The connection checking interval could tick just after 2400ms, and then just after 2900ms.

Example

@[2424ms] request2.completed: false
checking connection timeout
@[2525ms] request2.completed: false
@[2625ms] request2.completed: false
@[2727ms] request2.completed: false
@[2828ms] request2.completed: false
node:assert:400
    throw err;
    ^

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(request2.completed)

    at Timeout._onTimeout (/Users/nicksia/Documents/git/node/test/parallel/test-http-server-request-timeouts-mixed.js:108:5)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7) {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

The fix is to ensure connection checking interval is less than 500ms, and to decouple the check timeout from connection checking interval.

I'll put up a fix for this shortly.

nicksia-vgw avatar Aug 08 '22 10:08 nicksia-vgw

Failed again here - https://ci.nodejs.org/job/node-test-commit-osx/49021/nodes=osx11-x64/testReport/junit/(root)/test/parallel_test_http_server_request_timeouts_mixed_/

mhdawson avatar Dec 02 '22 19:12 mhdawson

Failed on Ubuntu 22 - https://ci.nodejs.org/job/node-test-commit-linuxone/40388/

anonrig avatar Oct 18 '23 01:10 anonrig

Failed on Ubuntu 22 - https://ci.nodejs.org/job/node-test-commit-linuxone/40388/

No, this one failed on rhel8-s390x.

richardlau avatar Oct 18 '23 01:10 richardlau

Seen this several times on osx - https://ci.nodejs.org/job/node-test-commit-osx-arm/nodes=osx11/14659/ as an example

mhdawson avatar Nov 23 '23 18:11 mhdawson