undici icon indicating copy to clipboard operation
undici copied to clipboard

Undici fails to read bodies from responses with implicit lengths

Open pimterry opened this issue 3 years ago • 10 comments

Bug Description

Paraphrasing the HTTP spec (RFC 7230 3.3.3), the message length for a normal response body can be defined in a few ways:

  • An explicit content-length header with a fixed length (point 3 in the RFC list)
  • Given a transfer-encoding header, a transfer coding chunk that explicitly ends the body (point 5 in the list)
  • If otherwise unspecified, all remaining bytes on the connection until the connection is closed (point 7 in the list)

Undici handles the last case incorrectly, making it impossible to read the HTTP response body for this simple "respond and then close the connection" case.

This doesn't come up much for big fancy servers, which tend to use keep-alive etc and explicit framing wherever they can, but it is common on quick simple HTTP server implementations, which avoid state and complexity by just streaming responses and closing the connection when they're done. It's a legitimate way to send responses according to the HTTP spec, and it is supported correctly by browsers and Node's http module, but not by Undici.

Reproducible By

Running in Node 18.0.0:

const http = require('node:http');

http.createServer((req, res) => {
    res.removeHeader('transfer-encoding');
    res.writeHead(200, {
        // Header isn't actually necessary, but tells node to close after response
        'connection': 'close'
    });
    res.end('a response body');
}).listen(8008);

fetch('http://localhost:8008').then(async (response) => { // Global fetch from Undici
    console.log('got response', response.status, response.headers);

    try {
        const responseText = await response.text()
        console.log('got body', responseText); // Should log the body OK here
    } catch (e) {
        console.log('error reading body', e); // Throws a 'terminated' error instead
    }
});

The server returns a simple response, just containing the default date header and a connection: close header, then sends the body, then closes the connection.

The connection: close header is for clarity - this should work equally well without that, just using res.socket.end() explicitly instead.

Expected Behavior

The above should print the status, headers, and then the response body, which is everything after the headers until the connection is closed.

This does work using fetch in browsers. To test this, run the script above (which will print the fetch error, but then keep running) then load localhost:8008 in your browser - the string loads successfully.

With that page loaded (for CORS) you can also run the equivalent command in the browser dev console, which also works and prints the response body correctly:

fetch('http://localhost:8008').then(async (response) =>
    console.log(await response.text()) // Logs the body correctly, no errors
);

This also works using Node's built-in HTTP module:

const http = require('http');
const streamConsumers = require('stream/consumers');

http.get('http://localhost:8008', async (res) => {
    const body = await streamConsumers.text(res);
    console.log('GET got body:', body); // Logs the body correctly
});

Logs & Screenshots

Undici's fetch does not print the response body, instead it throws an error:

error reading body TypeError: terminated
    at Fetch.onAborted (node:internal/deps/undici/undici:7881:53)
    at Fetch.emit (node:events:527:28)
    at Fetch.terminate (node:internal/deps/undici/undici:7135:14)
    at Object.onError (node:internal/deps/undici/undici:7968:36)
    at Request.onError (node:internal/deps/undici/undici:696:31)
    at errorRequest (node:internal/deps/undici/undici:2774:17)
    at Socket.onSocketClose (node:internal/deps/undici/undici:2236:9)
    at Socket.emit (node:events:527:28)
    at TCP.<anonymous> (node:net:715:12) {
  [cause]: SocketError: closed
      at Socket.onSocketClose (node:internal/deps/undici/undici:2224:35)
      at Socket.emit (node:events:527:28)
      at TCP.<anonymous> (node:net:715:12) {
    code: 'UND_ERR_SOCKET',
    socket: {
      localAddress: undefined,
      localPort: undefined,
      remoteAddress: undefined,
      remotePort: undefined,
      remoteFamily: 'IPvundefined',
      timeout: undefined,
      bytesWritten: 171,
      bytesRead: 90
    }
  }
}

Environment

Ubuntu

  • Node v18.0.0 with built-in fetch
  • Node v16.14.2 with Undici 5.1.1

pimterry avatar May 05 '22 15:05 pimterry

If otherwise unspecified, all remaining bytes on the connection until the connection is closed (point 7 in the list)

My reading of the spec is different.

If a Transfer-Encoding header field is present in a response and the chunked transfer coding is not the final encoding, the message body length is determined by reading the connection until it is closed by the server.

i.e. the transfer-encoding header still needs to be present. Hence your example actually goes under the 4. invalid response clause.

ronag avatar May 05 '22 15:05 ronag

I don't think that paragraph applies. It starts with "If a transfer-encoding header field is present" and in this case there is no transfer-encoding header.

I'm referring to the final case, point 7:

Otherwise, this is a response message without a declared message body length, so the message body length is determined by the number of octets received prior to the server closing the connection.

Also, this seems very unlikely to be an invalid response as that would imply that the HTTP stacks of Chrome, Firefox and Node.js are all doing the wrong thing here. Point 4 regarding invalid response bodies says:

If this is a response message received by a user agent, the user agent MUST close the connection to the server and discard the received response.

Nobody else does so - everybody except Undici happily accepts this response as legit.

pimterry avatar May 05 '22 15:05 pimterry

PR welcome

ronag avatar May 05 '22 16:05 ronag

I'd love to, but it looks like this is going to be deep in llhttp, which doesn't seem easy to dive into.

I've skimmed the README from https://github.com/nodejs/llhttp and the llparse intro (https://llparse.org/) and I get the gist, but that's pretty spare and it's quite hard to immediately work out how the details work in practice.

Is there a contributing guide for llhttp, or any more documentation for llparse anywhere? I'm not clear how to write test cases or debug it, and it's not clear exactly where Undici interacts with it to check inputs & outputs at the entrypoints.

Looks like the trail starts here: https://github.com/nodejs/undici/blob/360e5d1aab47ac336f6507fcc0b8f158566dfc39/deps/llhttp/src/http.c#L80-L83

That seems intended to start handling this exact situation to me, but something's going wrong in the parsing for that case at a later step. Or maybe parsing is working correctly, but then Undici is failing due to the connection close somewhere before it realises the response is complete & OK?

pimterry avatar May 05 '22 17:05 pimterry

Doing some digging, I've just run into https://github.com/nodejs/undici/issues/1412.

Pretty sure that's caused by this underlying issue (and the HTTP/2 mention there is an unrelated red herring). I can reproduce the exact same terminated error as here with the example site they use:

fetch('https://www.dailymail.co.uk/').then(res => res.text()).then(console.log)

Logging the headers too, they do include connection: close with no transfer-encoding or content-length, and Undici fails to read the body in the same way.

That suggests this failing edge case is a lot more common in reality than I'd expected, since this is the landing page of one of the top 100 most visited websites in the world (https://www.semrush.com/website/dailymail.co.uk/ says it's 73rd - they're terrible but sadly popular) and other headers suggest this response is being served by Akamai.

pimterry avatar May 05 '22 17:05 pimterry

it looks like this is going to be deep in llhttp

No it's not in llhttp. Node.js uses llhttp as well and it implements this.

mcollina avatar May 06 '22 11:05 mcollina

yep getting the same error trying to fetch github graphql api https://api.github.com/graphql

jcheese1 avatar Jul 12 '22 03:07 jcheese1

A PR to fix this would be nice.

mcollina avatar Jul 12 '22 08:07 mcollina

I have figured out the problem and I am currently figuring out how best to fix it.. Might not get the PR up before dinner, but hopefully in the next couple of hours.

evanderkoogh avatar Jul 12 '22 08:07 evanderkoogh

@evanderkoogh did we fix this?

ronag avatar Aug 22 '22 06:08 ronag