Undici fails to read bodies from responses with implicit lengths
Bug Description
Paraphrasing the HTTP spec (RFC 7230 3.3.3), the message length for a normal response body can be defined in a few ways:
- An explicit
content-lengthheader with a fixed length (point 3 in the RFC list) - Given a
transfer-encodingheader, a transfer coding chunk that explicitly ends the body (point 5 in the list) - If otherwise unspecified, all remaining bytes on the connection until the connection is closed (point 7 in the list)
Undici handles the last case incorrectly, making it impossible to read the HTTP response body for this simple "respond and then close the connection" case.
This doesn't come up much for big fancy servers, which tend to use keep-alive etc and explicit framing wherever they can, but it is common on quick simple HTTP server implementations, which avoid state and complexity by just streaming responses and closing the connection when they're done. It's a legitimate way to send responses according to the HTTP spec, and it is supported correctly by browsers and Node's http module, but not by Undici.
Reproducible By
Running in Node 18.0.0:
const http = require('node:http');
http.createServer((req, res) => {
res.removeHeader('transfer-encoding');
res.writeHead(200, {
// Header isn't actually necessary, but tells node to close after response
'connection': 'close'
});
res.end('a response body');
}).listen(8008);
fetch('http://localhost:8008').then(async (response) => { // Global fetch from Undici
console.log('got response', response.status, response.headers);
try {
const responseText = await response.text()
console.log('got body', responseText); // Should log the body OK here
} catch (e) {
console.log('error reading body', e); // Throws a 'terminated' error instead
}
});
The server returns a simple response, just containing the default date header and a connection: close header, then sends the body, then closes the connection.
The connection: close header is for clarity - this should work equally well without that, just using res.socket.end() explicitly instead.
Expected Behavior
The above should print the status, headers, and then the response body, which is everything after the headers until the connection is closed.
This does work using fetch in browsers. To test this, run the script above (which will print the fetch error, but then keep running) then load localhost:8008 in your browser - the string loads successfully.
With that page loaded (for CORS) you can also run the equivalent command in the browser dev console, which also works and prints the response body correctly:
fetch('http://localhost:8008').then(async (response) =>
console.log(await response.text()) // Logs the body correctly, no errors
);
This also works using Node's built-in HTTP module:
const http = require('http');
const streamConsumers = require('stream/consumers');
http.get('http://localhost:8008', async (res) => {
const body = await streamConsumers.text(res);
console.log('GET got body:', body); // Logs the body correctly
});
Logs & Screenshots
Undici's fetch does not print the response body, instead it throws an error:
error reading body TypeError: terminated
at Fetch.onAborted (node:internal/deps/undici/undici:7881:53)
at Fetch.emit (node:events:527:28)
at Fetch.terminate (node:internal/deps/undici/undici:7135:14)
at Object.onError (node:internal/deps/undici/undici:7968:36)
at Request.onError (node:internal/deps/undici/undici:696:31)
at errorRequest (node:internal/deps/undici/undici:2774:17)
at Socket.onSocketClose (node:internal/deps/undici/undici:2236:9)
at Socket.emit (node:events:527:28)
at TCP.<anonymous> (node:net:715:12) {
[cause]: SocketError: closed
at Socket.onSocketClose (node:internal/deps/undici/undici:2224:35)
at Socket.emit (node:events:527:28)
at TCP.<anonymous> (node:net:715:12) {
code: 'UND_ERR_SOCKET',
socket: {
localAddress: undefined,
localPort: undefined,
remoteAddress: undefined,
remotePort: undefined,
remoteFamily: 'IPvundefined',
timeout: undefined,
bytesWritten: 171,
bytesRead: 90
}
}
}
Environment
Ubuntu
- Node v18.0.0 with built-in fetch
- Node v16.14.2 with Undici 5.1.1
If otherwise unspecified, all remaining bytes on the connection until the connection is closed (point 7 in the list)
My reading of the spec is different.
If a Transfer-Encoding header field is present in a response and the chunked transfer coding is not the final encoding, the message body length is determined by reading the connection until it is closed by the server.
i.e. the transfer-encoding header still needs to be present. Hence your example actually goes under the 4. invalid response clause.
I don't think that paragraph applies. It starts with "If a transfer-encoding header field is present" and in this case there is no transfer-encoding header.
I'm referring to the final case, point 7:
Otherwise, this is a response message without a declared message body length, so the message body length is determined by the number of octets received prior to the server closing the connection.
Also, this seems very unlikely to be an invalid response as that would imply that the HTTP stacks of Chrome, Firefox and Node.js are all doing the wrong thing here. Point 4 regarding invalid response bodies says:
If this is a response message received by a user agent, the user agent MUST close the connection to the server and discard the received response.
Nobody else does so - everybody except Undici happily accepts this response as legit.
PR welcome
I'd love to, but it looks like this is going to be deep in llhttp, which doesn't seem easy to dive into.
I've skimmed the README from https://github.com/nodejs/llhttp and the llparse intro (https://llparse.org/) and I get the gist, but that's pretty spare and it's quite hard to immediately work out how the details work in practice.
Is there a contributing guide for llhttp, or any more documentation for llparse anywhere? I'm not clear how to write test cases or debug it, and it's not clear exactly where Undici interacts with it to check inputs & outputs at the entrypoints.
Looks like the trail starts here: https://github.com/nodejs/undici/blob/360e5d1aab47ac336f6507fcc0b8f158566dfc39/deps/llhttp/src/http.c#L80-L83
That seems intended to start handling this exact situation to me, but something's going wrong in the parsing for that case at a later step. Or maybe parsing is working correctly, but then Undici is failing due to the connection close somewhere before it realises the response is complete & OK?
Doing some digging, I've just run into https://github.com/nodejs/undici/issues/1412.
Pretty sure that's caused by this underlying issue (and the HTTP/2 mention there is an unrelated red herring). I can reproduce the exact same terminated error as here with the example site they use:
fetch('https://www.dailymail.co.uk/').then(res => res.text()).then(console.log)
Logging the headers too, they do include connection: close with no transfer-encoding or content-length, and Undici fails to read the body in the same way.
That suggests this failing edge case is a lot more common in reality than I'd expected, since this is the landing page of one of the top 100 most visited websites in the world (https://www.semrush.com/website/dailymail.co.uk/ says it's 73rd - they're terrible but sadly popular) and other headers suggest this response is being served by Akamai.
it looks like this is going to be deep in llhttp
No it's not in llhttp. Node.js uses llhttp as well and it implements this.
yep getting the same error trying to fetch github graphql api https://api.github.com/graphql
A PR to fix this would be nice.
I have figured out the problem and I am currently figuring out how best to fix it.. Might not get the PR up before dinner, but hopefully in the next couple of hours.
@evanderkoogh did we fix this?