node
node copied to clipboard
test: fix flaky timeout in test-pipe-file-to-http
Attempt to fix: https://github.com/nodejs/node/issues/52963
There are a few places could cause the test to timeout, first of all rely on process.exit was probably not a great idea.
Timeout was never cleared, 10mb might be a bit heavy on lower power machine.
Tried another 4000 runs, no failures, should be stable now?
Will actively monitor the issue once this PR lands to see if there is still any failure in CI.
Previous attempt PR was closed and cannot re-opened.
Are you still able to get timeouts on main without this patch?
Yes, I can still get timeout on main - it is reproducible in every 1000 run.
Yes, I can still get timeout on main - it is reproducible in every 1000 run.
When the test times out, is the 'end' event emitted on the server response? You can check that by adding something like this:
res.on('end', () => {
console.log('end');
server.close();
});
Can I ask how to ask the ./tools/test.py to print console.logs, I tried with -v flag ./tools/test.py --repeat=1000 -v test/parallel/test-pipe-file-to-http.js but I still cannot see the console output from console.log
It will do it automatically for failed tests. You should see something like "TIMEOUT" and then the console.log() stuff.
end is emitted
Path: parallel/test-pipe-file-to-http
end
Command: out/Release/node node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---
Ok, then it is not related to the file size either. The process for some reason (https://github.com/nodejs/node/issues/52964#issuecomment-2106317993, https://github.com/nodejs/node/pull/52959#issuecomment-2115591434) does not exit.
I tried a few more runs - sometimes the end is not printed though
my code snippet for conosle log
res.on('end', () => {
console.log('end called');
assert.strictEqual(count, fileSize);
console.log('closing the server');
server.close();
console.log('server should be closed')
});
Error logs
=== release test-pipe-file-to-http ===
Path: parallel/test-pipe-file-to-http
Command: out/Release/node ./node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---
=== release test-pipe-file-to-http ===
Path: parallel/test-pipe-file-to-http
end called
closing the server
server should be closed
Command: out/Release/node./node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---
=== release test-pipe-file-to-http ===
Path: parallel/test-pipe-file-to-http
end called
closing the server
server should be closed
Command: out/Release/node ./node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---
=== release test-pipe-file-to-http ===
Path: parallel/test-pipe-file-to-http
Command: out/Release/node ./node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---
I have also tried the --jitless flag again, and I found interesting when the test is timeout it says Warning: disabling flag --expose_wasm due to conflicting flags but I have not used --expose_wasm anywhere.
=== release test-pipe-file-to-http ===
Path: parallel/test-pipe-file-to-http
Warning: disabling flag --expose_wasm due to conflicting flags
Command: out/Release/node --jitless ./oss/node/test/parallel/test-pipe-file-to-http.js
--- TIMEOUT ---
Do you have any idea why Warning: disabling flag --expose_wasm due to conflicting flags showed up when I only used --jitless 👀
No, I don't know.
Some update:
Been running this test on a Linux ubuntu x64 machine, more than 100k+ runs, no failure.
But running on my m1 arm64 will have a quite consistent 0.1% failure rate. With --jitless I would see the Warning: disabling flag --expose_wasm due to conflicting flags when test timeout, not sure if it is related, the closest thing I can find is this, maybe the failure is related to arm build only?
May I ask what environment you are using @lpinca?
Will be looking into this for another day, I think we can probably add this test to the flaky test status until we can find more clues. Any suggestions?
May I ask what environment you are using @lpinca?
$ npx envinfo --system
System:
OS: macOS 14.5
CPU: (16) x64 Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
Memory: 20.24 GB / 32.00 GB
Shell: 5.2.26 - /usr/local/bin/bash
Will be looking into this for another day, I think we can probably add this test to the flaky test status until we can find more clues. Any suggestions?
The last CI failure occurred on ARM on 2024-06-22 (https://github.com/nodejs/reliability/issues/903). I think we mark it flaky on that platform.
Marked as flaky in 53751, hopefully we can find what the root cause is.