node icon indicating copy to clipboard operation
node copied to clipboard

test: fix flaky timeout in test-pipe-file-to-http

Open jakecastelli opened this issue 1 year ago • 10 comments
trafficstars

Attempt to fix: https://github.com/nodejs/node/issues/52963

There are a few places could cause the test to timeout, first of all rely on process.exit was probably not a great idea.

Timeout was never cleared, 10mb might be a bit heavy on lower power machine.

Tried another 4000 runs, no failures, should be stable now?

Will actively monitor the issue once this PR lands to see if there is still any failure in CI.

Previous attempt PR was closed and cannot re-opened.

jakecastelli avatar Jun 26 '24 16:06 jakecastelli

Are you still able to get timeouts on main without this patch?

lpinca avatar Jun 27 '24 12:06 lpinca

Yes, I can still get timeout on main - it is reproducible in every 1000 run.

jakecastelli avatar Jun 27 '24 12:06 jakecastelli

Yes, I can still get timeout on main - it is reproducible in every 1000 run.

When the test times out, is the 'end' event emitted on the server response? You can check that by adding something like this:

res.on('end',  () => {
  console.log('end');
  server.close();
});

lpinca avatar Jun 27 '24 12:06 lpinca

Can I ask how to ask the ./tools/test.py to print console.logs, I tried with -v flag ./tools/test.py --repeat=1000 -v test/parallel/test-pipe-file-to-http.js but I still cannot see the console output from console.log

jakecastelli avatar Jun 27 '24 13:06 jakecastelli

It will do it automatically for failed tests. You should see something like "TIMEOUT" and then the console.log() stuff.

lpinca avatar Jun 27 '24 13:06 lpinca

end is emitted

Path: parallel/test-pipe-file-to-http
end
Command: out/Release/node node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---

jakecastelli avatar Jun 27 '24 13:06 jakecastelli

Ok, then it is not related to the file size either. The process for some reason (https://github.com/nodejs/node/issues/52964#issuecomment-2106317993, https://github.com/nodejs/node/pull/52959#issuecomment-2115591434) does not exit.

lpinca avatar Jun 27 '24 13:06 lpinca

I tried a few more runs - sometimes the end is not printed though

my code snippet for conosle log
    res.on('end', () => {
      console.log('end called');

      assert.strictEqual(count, fileSize);

      console.log('closing the server');

      server.close();

      console.log('server should be closed')
    });
Error logs
=== release test-pipe-file-to-http ===                    
Path: parallel/test-pipe-file-to-http
Command: out/Release/node ./node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---


=== release test-pipe-file-to-http ===                    
Path: parallel/test-pipe-file-to-http
end called
closing the server
server should be closed
Command: out/Release/node./node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---

=== release test-pipe-file-to-http ===                    
Path: parallel/test-pipe-file-to-http
end called
closing the server
server should be closed
Command: out/Release/node ./node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---


=== release test-pipe-file-to-http ===                    
Path: parallel/test-pipe-file-to-http
Command: out/Release/node ./node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---

I have also tried the --jitless flag again, and I found interesting when the test is timeout it says Warning: disabling flag --expose_wasm due to conflicting flags but I have not used --expose_wasm anywhere.

=== release test-pipe-file-to-http ===                    
Path: parallel/test-pipe-file-to-http
Warning: disabling flag --expose_wasm due to conflicting flags
Command: out/Release/node --jitless ./oss/node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---

jakecastelli avatar Jun 28 '24 02:06 jakecastelli

Do you have any idea why Warning: disabling flag --expose_wasm due to conflicting flags showed up when I only used --jitless 👀

jakecastelli avatar Jun 29 '24 14:06 jakecastelli

No, I don't know.

lpinca avatar Jun 29 '24 18:06 lpinca

Some update:

Been running this test on a Linux ubuntu x64 machine, more than 100k+ runs, no failure.

But running on my m1 arm64 will have a quite consistent 0.1% failure rate. With --jitless I would see the Warning: disabling flag --expose_wasm due to conflicting flags when test timeout, not sure if it is related, the closest thing I can find is this, maybe the failure is related to arm build only?

May I ask what environment you are using @lpinca?

Will be looking into this for another day, I think we can probably add this test to the flaky test status until we can find more clues. Any suggestions?

jakecastelli avatar Jul 01 '24 01:07 jakecastelli

May I ask what environment you are using @lpinca?

$ npx envinfo --system

  System:
    OS: macOS 14.5
    CPU: (16) x64 Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
    Memory: 20.24 GB / 32.00 GB
    Shell: 5.2.26 - /usr/local/bin/bash

lpinca avatar Jul 01 '24 16:07 lpinca

Will be looking into this for another day, I think we can probably add this test to the flaky test status until we can find more clues. Any suggestions?

The last CI failure occurred on ARM on 2024-06-22 (https://github.com/nodejs/reliability/issues/903). I think we mark it flaky on that platform.

lpinca avatar Jul 01 '24 16:07 lpinca

Marked as flaky in 53751, hopefully we can find what the root cause is.

jakecastelli avatar Jul 07 '24 12:07 jakecastelli