Streaming big files causes random chunk duplication
👋
While creating a hobby project with hyper-express I noticed when I'm streaming a file to the response, the checksum of the received response doesn't match the original file. The chance of this happening gets higher with bigger files. I can consistently reproduce with a file around 256M. The response seems to have some duplicated parts.
webserver.get("/", (request, response) => {
const stream = fs.createReadStream("./sample.txt");
stream.pipe(response);
});
I also tried reading streaming file chunk by chunk and writing it into the response res.write(chunk) while handling backpressure, but that did not change anything. The bug still appears. After switching to the node's builtin http server, this was no longer an issue. that's why I suspect something is going on with hyper-express.
I made this little repo to help out reproducing this https://github.com/amir-s/debug-hyper-express
You can run npm install and npm run generate to generate the sample.txt file.
❯ shasum sample.txt
4005870368b86b280241605852047f4f390f7b11 sample.txt
Then you can start the server npm start and download the file via curl. I ran it a few times and each time the size of the downloaded file is different, so as the checksum.
❯ curl localhost:3000 -o downloaded.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 271M 0 271M 0 0 1583M 0 --:--:-- --:--:-- --:--:-- 1623M
❯ shasum downloaded.txt sample.txt
740c6f2555c822eb0c706608ab60a3c42f221624 downloaded.txt
4005870368b86b280241605852047f4f390f7b11 sample.txt
❯ curl localhost:3000 -o downloaded.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 256M 0 256M 0 0 1394M 0 --:--:-- --:--:-- --:--:-- 1443M
❯ shasum downloaded.txt sample.txt
a6426e5f5ef80cf631d467250f127f7c7a5720b0 downloaded.txt
4005870368b86b280241605852047f4f390f7b11 sample.txt
I'm using node v20.14.0.
I see, I'll begin to look into this further but It seems to be a bug which would have to do with backpressure handling. The problem is I can't seem to pinpoint any logical bug in the backpressure handling. Will try and do more gradual debugging to see If its a bug within uWebsockets.js itself.
@kartikk221 Thank you!
I did a little digging and it appears the chunks that cause the backpressure get sent to the client twice. The logic for handling backpressure looks fine to me as far as I understand it.
https://github.com/kartikk221/hyper-express/blob/7c2544b6ade33b537c06983da061c30a4ea6ab2b/src/components/http/Response.js#L619-L633
However the offset is always 0 for whatever reason. But from what actually happens, the chunk has been fully received by the client when the buffer is drained. Essentially If I manually force offset = chunk.length; right before const remaining = chunk.slice(offset - write_offset);, things would work.
Where does the offset here come from?
https://github.com/kartikk221/hyper-express/blob/7c2544b6ade33b537c06983da061c30a4ea6ab2b/src/components/http/Response.js#L398-L412
Is it from uWebsockets.js?
Yes, the offset argument comes from uWebsockets.js as it is supposed to tell you how many bytes of the original chunk were written. At least that is how I understand it which is why I don't really notice a problem in the backpressure / retry logic.
Could this be related? https://github.com/uNetworking/uWebSockets.js/issues/726#issuecomment-1083081077
Could this be related? uNetworking/uWebSockets.js#726 (comment)
That would explain it actually, so we shouldn't use the offset from the onWritable callback since the write call will buffer and write the whole chunk even if it returns false due to backpressure.
One thing that I am confused about in https://github.com/uNetworking/uWebSockets.js/issues/726#issuecomment-1083081077 is where ghost mentions "(but do not use offset, use your offset)".
What does he potentially mean by use our own offset? Does he just mean serve the next chunk?
This also begs another question which is what the boolean returned to the onWritable callback is supposed to mean? Because If we are not supposed retry the failed chunk even partially then should we simply continue execution and return true anyways without writing anything within the onWritable / drain itself?
I have the same question!
I guess at this point when backpressure happens, the only thing we need to handle is to just wait for the drain and then continue.
Something like
if (sent) {
// The chunk was fully sent, we can resolve the promise
resolve();
} else {
this.drain(() => {
resolve();
return true;
});
}
The return true part feels like cheating a little!
According to https://github.com/uNetworking/uWebSockets.js/issues/726#issuecomment-1083086046 tryWrite does not buffer if it fails. In contrast write does buffer if it fails. Since hyper-express uses write:
https://github.com/kartikk221/hyper-express/blob/7c2544b6ade33b537c06983da061c30a4ea6ab2b/src/components/http/Response.js#L575
I guess we can be sure it buffers the bites that are not sent.
Another solution would have been to use tryWrite but according to ghost (and I also double checked) it has not been implemented!
I have the same question!
I guess at this point when backpressure happens, the only thing we need to handle is to just wait for the
drainand then continue.Something like
if (sent) { // The chunk was fully sent, we can resolve the promise resolve(); } else { this.drain(() => { resolve(); return true; }); }The
return truepart feels like cheating a little!
Can you try modifying your local hyper-express source code in the node_modules folder to see If your issue is resolved with your fix?
I did and it fixed my issue. But I fear it's gonna break some other stuff. In my case total_size is undefined since it is optional. But I see here:
https://github.com/kartikk221/hyper-express/blob/7c2544b6ade33b537c06983da061c30a4ea6ab2b/src/components/http/Response.js#L555-L572
If total_size is provided, we'd use tryEnd which then relies on us handling the offset to resend the chunks when backpressure happens.
I think we can also switch to end instead of tryEnd to make sure all backpressure handling logic assumes the chunks that cause backpressure are handled by uWS buffer. What do you think?
Actually .end does not quite work. The tests fail in my PR.
/Users/amir/src/github.com/amir-s/hyper-express/src/components/http/Response.js:570
const [ok, done] = this._raw_response.end(chunk, total_size);
^
TypeError: object is not iterable (cannot read property Symbol(Symbol.iterator))
No, you cannot use .end() since .end() is meant to be the final call to end the response. When there is a total_size provided, we have to use tryEnd. Why don't you try this? Only use the offset / partial chunk writes If we are doing tryEnd else just do the behavior we are experimenting with where we assume .write() will buffer and write the chunk fully.
If that's the case I think the simplest solution would be to resolve the promise and early return when total_size is not provided. We already have a condition we can re-use. I updated the draft PR https://github.com/kartikk221/hyper-express/pull/289.
What do you think?
If that's the case I think the simplest solution would be to
resolvethe promise and early return whentotal_sizeis not provided. We already have a condition we can re-use. I updated the draft PR #289. What do you think?
Could you test both your own use case and also the test suite to see If your changes pass? I'll do the same on my end and make any refinements to your PR as needed.
I tried the fix streaming a large file with and without providing the total size. Without my PR, things only work properly when total size is provided. But after the fix, both cases work fine.
I tried providing the total size via res.stream(fileStream, size). I can see it goes through this._raw_response.tryEnd(chunk, total_size); and offset also gets the proper values.
I can also not provide the total size in the same scenario and things still work properly when the fix is applied.
I'd love to contribute and add couple of tests too, but it's a little hard to navigate the tests. Any pointers?
bump! @kartikk221
This bug has been fixed in the recent v6.17.1 update of hyper-express which is now live.
@amir-s Apologize for the long delay, was extremely busy with work so was unable to handle this till now.