undici icon indicating copy to clipboard operation
undici copied to clipboard

poc: original request is not aborted on bodytimeout (only retry handler?)

Open Uzlopak opened this issue 4 months ago • 11 comments

This PR is far from perfect! Actually started as an approach to deflake test/isue-3356.js.

Please have a look at the code and the tests. It is hard to explain:

It seems that the flakyness actually shows that we have some underlying issue. If we get a bodyTimeout, it doesnt mean that the connection is closed. It just means that the body did not finish in the expected time. Ok, no problem is the connection closes after some time. But if we have a body timeout and the connection is still open and potentially still sending data, then some undefined behavior happens.

So I assume, we have to abort the request before we retry with a new request. Maybe even check how much data we buffered already, and set the corresponding content-range headers.

But tbh. I am kind of lost in this part of the code. So I show you this, maybe you have better ideas.

This relates to...

Rationale

Changes

Features

Bug Fixes

Breaking Changes and Deprecations

Status

  • [ ] I have read and agreed to the Developer's Certificate of Origin
  • [ ] Tested
  • [ ] Benchmarked (optional)
  • [ ] Documented
  • [ ] Review ready
  • [ ] In review
  • [ ] Merge ready

Uzlopak avatar Aug 26 '25 13:08 Uzlopak

IMHO I do not think the solution u are looking for is entirely in retry handler, you have two issues described in the description of the PR

you described two issues here

But if we have a body timeout and the connection is still open and potentially still sending data, then some undefined behavior happens.

How can it happen, if you destroy the socket on bodyTimeout?

util.destroy(socket, new BodyTimeoutError())

Maybe even check how much data we buffered already, and set the corresponding content-range headers.

Thats the only implementation that should be in retry handler There is no reason to stop the retry process on body timeout AFAIK

fatal10110 avatar Aug 28 '25 06:08 fatal10110

@metcoder95

I personally lack the insights in these parts and I think it would great if you would investigate it further. Should i close this PR?

Uzlopak avatar Aug 30 '25 09:08 Uzlopak

@fatal10110

I dont think the socket gets destroyed. Would need to investigate though. But i guess it is because we are not directly working on the h1 client? Idk.

Anyhow imho undici is acting strange. This PR was just a poc. Maybe everything is fine and i am wrong...

Uzlopak avatar Aug 30 '25 09:08 Uzlopak

Is there a way / test to repoduce the issue u are describing? Running the test on main brach, it always passes

artur-ma avatar Aug 31 '25 08:08 artur-ma

@artur-ma

This is what i see, when i run my modified test on main:

aras@aras-HP-ZBook-15-G3:~/workspace/undici$ node test/issue-3356.js 
✖ https://github.com/nodejs/undici/issues/3356 (1525.138235ms)
  AssertionError [ERR_ASSERTION]: should not be called twice
      at res.<computed> [as fail] (/home/aras/workspace/undici/node_modules/@matteo.collina/tspl/tspl.js:58:35)
      at Server.<anonymous> (/home/aras/workspace/undici/test/issue-3356.js:23:9)
      at Server.emit (node:events:524:28)
      at parserOnIncoming (node:_http_server:1141:12)
      at HTTPParser.parserOnHeadersComplete (node:_http_common:118:17) {
    generatedMessage: false,
    code: 'ERR_ASSERTION',
    actual: undefined,
    expected: undefined,
    operator: 'fail'
  }

﹣ https://github.com/nodejs/undici/issues/3356 (0.122966ms) # SKIP

The process hangs...

Uzlopak avatar Aug 31 '25 09:08 Uzlopak

I personally lack the insights in these parts and I think it would great if you would investigate it further. Should i close this PR?

Sure, I can do that

metcoder95 avatar Sep 01 '25 06:09 metcoder95

@Uzlopak

@artur-ma

This is what i see, when i run my modified test on main:

aras@aras-HP-ZBook-15-G3:~/workspace/undici$ node test/issue-3356.js 
✖ https://github.com/nodejs/undici/issues/3356 (1525.138235ms)
  AssertionError [ERR_ASSERTION]: should not be called twice
      at res.<computed> [as fail] (/home/aras/workspace/undici/node_modules/@matteo.collina/tspl/tspl.js:58:35)
      at Server.<anonymous> (/home/aras/workspace/undici/test/issue-3356.js:23:9)
      at Server.emit (node:events:524:28)
      at parserOnIncoming (node:_http_server:1141:12)
      at HTTPParser.parserOnHeadersComplete (node:_http_common:118:17) {
    generatedMessage: false,
    code: 'ERR_ASSERTION',
    actual: undefined,
    expected: undefined,
    operator: 'fail'
  }

﹣ https://github.com/nodejs/undici/issues/3356 (0.122966ms) # SKIP

The process hangs...

That sounds like incorrect test.. Its expected to be called twice, since this is the purpose of retry on timeout From what I understand, the case you are trying to fix is another one, that both sockets are active (data is written in to it) at the same time because the first socket wasnt destryed.

artur-ma avatar Sep 04 '25 08:09 artur-ma

@artur-ma

Did you read what I wrote? Did you read the corresponding issue?

Exactly the opposite of what you wrote is the expected behavior.

Uzlopak avatar Sep 06 '25 07:09 Uzlopak

@Uzlopak

@artur-ma

Did you read what I wrote? Did you read the corresponding issue?

Exactly the opposite of what you wrote is the expected behavior.

I read what u wrote, and this is exactly what Im saying

But if we have a body timeout and the connection is still open and potentially still sending data, then some undefined behavior happens.

Retry is expected on timeout, the second call to the API is expected this is the purpose of retry mechanism, what is not expected, is that the old socket still be active after timeout

So how is it the opossite?

artur-ma avatar Sep 07 '25 05:09 artur-ma

@artur-ma

If the server does not support ranges, no range headers are sent, no 206 status code and maybe no etag to verify, then we should not retry if a body was already sent. The test does simulates a case which does not meet the conditions for a retry, but the retry handler does the retry anway.

Uzlopak avatar Sep 07 '25 08:09 Uzlopak

@Uzlopak

@artur-ma

If the server does not support ranges, no range headers are sent, no 206 status code and maybe no etag to verify, then we should not retry if a body was already sent. The test does simulates a case which does not meet the conditions for a retry, but the retry handler does the retry anway.

Thank you for clarification, and still it sounds like the test is wrong, the logic u described is not handled right now in mian, so after adding errorCodes: ['UND_ERR_BODY_TIMEOUT'] as RetryAgent option it is expected to be retried, as u are setting it explicitly as retryable error code.

If its not set, the error will be thrown and bubble up to this line https://github.com/nodejs/undici/blob/f182ff12d522bd420d47ba6489266ab172ad6b3f/lib/core/request.js#L314

which AFAIK basically does the same thing as controller.abort() that u used in this PR The only difference here is that someone can catch the error down the line

The logic you described is handled in a method onResponseStart Which means, if the request was retried, on the retry process on consuming the data on the second time, we indicate that this error shouldn't be retried in the first place (this flow should not happen as body timeout by default throws, and no retry happens)

please correct me if I get you wrong again.

artur-ma avatar Sep 09 '25 08:09 artur-ma