poc: original request is not aborted on bodytimeout (only retry handler?)
This PR is far from perfect! Actually started as an approach to deflake test/isue-3356.js.
Please have a look at the code and the tests. It is hard to explain:
It seems that the flakyness actually shows that we have some underlying issue. If we get a bodyTimeout, it doesnt mean that the connection is closed. It just means that the body did not finish in the expected time. Ok, no problem is the connection closes after some time. But if we have a body timeout and the connection is still open and potentially still sending data, then some undefined behavior happens.
So I assume, we have to abort the request before we retry with a new request. Maybe even check how much data we buffered already, and set the corresponding content-range headers.
But tbh. I am kind of lost in this part of the code. So I show you this, maybe you have better ideas.
This relates to...
Rationale
Changes
Features
Bug Fixes
Breaking Changes and Deprecations
Status
- [ ] I have read and agreed to the Developer's Certificate of Origin
- [ ] Tested
- [ ] Benchmarked (optional)
- [ ] Documented
- [ ] Review ready
- [ ] In review
- [ ] Merge ready
IMHO I do not think the solution u are looking for is entirely in retry handler, you have two issues described in the description of the PR
you described two issues here
But if we have a body timeout and the connection is still open and potentially still sending data, then some undefined behavior happens.
How can it happen, if you destroy the socket on bodyTimeout?
util.destroy(socket, new BodyTimeoutError())
Maybe even check how much data we buffered already, and set the corresponding content-range headers.
Thats the only implementation that should be in retry handler There is no reason to stop the retry process on body timeout AFAIK
@metcoder95
I personally lack the insights in these parts and I think it would great if you would investigate it further. Should i close this PR?
@fatal10110
I dont think the socket gets destroyed. Would need to investigate though. But i guess it is because we are not directly working on the h1 client? Idk.
Anyhow imho undici is acting strange. This PR was just a poc. Maybe everything is fine and i am wrong...
Is there a way / test to repoduce the issue u are describing? Running the test on main brach, it always passes
@artur-ma
This is what i see, when i run my modified test on main:
aras@aras-HP-ZBook-15-G3:~/workspace/undici$ node test/issue-3356.js
✖ https://github.com/nodejs/undici/issues/3356 (1525.138235ms)
AssertionError [ERR_ASSERTION]: should not be called twice
at res.<computed> [as fail] (/home/aras/workspace/undici/node_modules/@matteo.collina/tspl/tspl.js:58:35)
at Server.<anonymous> (/home/aras/workspace/undici/test/issue-3356.js:23:9)
at Server.emit (node:events:524:28)
at parserOnIncoming (node:_http_server:1141:12)
at HTTPParser.parserOnHeadersComplete (node:_http_common:118:17) {
generatedMessage: false,
code: 'ERR_ASSERTION',
actual: undefined,
expected: undefined,
operator: 'fail'
}
﹣ https://github.com/nodejs/undici/issues/3356 (0.122966ms) # SKIP
The process hangs...
I personally lack the insights in these parts and I think it would great if you would investigate it further. Should i close this PR?
Sure, I can do that
@Uzlopak
@artur-ma
This is what i see, when i run my modified test on main:
aras@aras-HP-ZBook-15-G3:~/workspace/undici$ node test/issue-3356.js ✖ https://github.com/nodejs/undici/issues/3356 (1525.138235ms) AssertionError [ERR_ASSERTION]: should not be called twice at res.<computed> [as fail] (/home/aras/workspace/undici/node_modules/@matteo.collina/tspl/tspl.js:58:35) at Server.<anonymous> (/home/aras/workspace/undici/test/issue-3356.js:23:9) at Server.emit (node:events:524:28) at parserOnIncoming (node:_http_server:1141:12) at HTTPParser.parserOnHeadersComplete (node:_http_common:118:17) { generatedMessage: false, code: 'ERR_ASSERTION', actual: undefined, expected: undefined, operator: 'fail' } ﹣ https://github.com/nodejs/undici/issues/3356 (0.122966ms) # SKIPThe process hangs...
That sounds like incorrect test.. Its expected to be called twice, since this is the purpose of retry on timeout From what I understand, the case you are trying to fix is another one, that both sockets are active (data is written in to it) at the same time because the first socket wasnt destryed.
@artur-ma
Did you read what I wrote? Did you read the corresponding issue?
Exactly the opposite of what you wrote is the expected behavior.
@Uzlopak
@artur-ma
Did you read what I wrote? Did you read the corresponding issue?
Exactly the opposite of what you wrote is the expected behavior.
I read what u wrote, and this is exactly what Im saying
But if we have a body timeout and the connection is still open and potentially still sending data, then some undefined behavior happens.
Retry is expected on timeout, the second call to the API is expected this is the purpose of retry mechanism, what is not expected, is that the old socket still be active after timeout
So how is it the opossite?
@artur-ma
If the server does not support ranges, no range headers are sent, no 206 status code and maybe no etag to verify, then we should not retry if a body was already sent. The test does simulates a case which does not meet the conditions for a retry, but the retry handler does the retry anway.
@Uzlopak
@artur-ma
If the server does not support ranges, no range headers are sent, no 206 status code and maybe no etag to verify, then we should not retry if a body was already sent. The test does simulates a case which does not meet the conditions for a retry, but the retry handler does the retry anway.
Thank you for clarification, and still it sounds like the test is wrong, the logic u described is not handled right now in mian, so after adding errorCodes: ['UND_ERR_BODY_TIMEOUT'] as RetryAgent option it is expected to be retried, as u are setting it explicitly as retryable error code.
If its not set, the error will be thrown and bubble up to this line https://github.com/nodejs/undici/blob/f182ff12d522bd420d47ba6489266ab172ad6b3f/lib/core/request.js#L314
which AFAIK basically does the same thing as controller.abort() that u used in this PR
The only difference here is that someone can catch the error down the line
The logic you described is handled in a method onResponseStart Which means, if the request was retried, on the retry process on consuming the data on the second time, we indicate that this error shouldn't be retried in the first place (this flow should not happen as body timeout by default throws, and no retry happens)
please correct me if I get you wrong again.