compression icon indicating copy to clipboard operation
compression copied to clipboard

Fix usage of undocumented _implicitHeader

Open maritz opened this issue 7 years ago • 38 comments

As discussed in #127 this PR now only includes the changes that should not break node v0.8 and has tests.

Sadly supertest does not support http2 yet, so I had to resort to vanilla server/client creation.

I'm not destroying the client/server here, not sure if that's needed?!

maritz avatar Jan 10 '18 18:01 maritz

Right, tests are of course failing, because the http2 module doesn't exist in previous versions. D'oh.

I'll add a try catch around the require, thus only running http2 tests when http2 is available. Is that acceptable?

maritz avatar Jan 10 '18 18:01 maritz

Right, now the tests pass, but the travis config doesn't have node v9 in it, so no http2 tests are run there.

Am I allowed to just add that?

Edit: I just did... Deal with it. :-P

maritz avatar Jan 10 '18 19:01 maritz

Yea, you can go ahead and update the Travis CI file. We just have one entry for the latest minor for each major. I'm not sure if any of the 8.x ones have the http2 module, but at least adding the most recent 9 would work. Make sure to validate all functionality works with 9.x, including no deprecation message are printed, for this and all our dependencies before adding it to Travis CI 👍 that is our standard Travis CI process.

dougwilson avatar Jan 15 '18 16:01 dougwilson

There is one warning node currently gives when using http2: (node:4065) ExperimentalWarning: The http2 module is an experimental API.

As far as I can tell neither v8.8 nor v9.4 show any deprecation warnings.

maritz avatar Jan 15 '18 16:01 maritz

Thanks for checking, @maritz . Can you verify that you ran every test suite of even dependency of this module throughout the tree with Node.js 9.x as well to check this? This module doesn't fully exersize all the features of our own dependencies because the assumption is that they already have test suites, so we don't want to copy-and-paste every test suite each module up. Many times the issue is only apparently when doing this, so we have to do this in order to bump any version in the Travis CI configuration 👍

dougwilson avatar Jan 15 '18 16:01 dougwilson

Can you verify that you ran every test suite of even dependency of this module throughout the tree with Node.js 9.x as well to check this?

Do you mean going through all 12 dependencies, check out their git repo and manually run their tests? If so, no. I don't have the time to do that right now and quite frankly don't think that's a reasonable thing to do.

Or do you have some tool to do this? Or did I misunderstand?

maritz avatar Jan 15 '18 17:01 maritz

Yes, that is the correct understanding. It's no problem at all, I can do it 👍 I just didn't want to if you already did it. I just use some shell scripts that use jq and git to cloen out the repos and then find + npm to run the tests through them in a single pass. We have been burned multiple times by not doing this, so it's a requirement in order to add new versions. We use the versions in Travis CI as the list of "verified compatible" versions, basically. We've only ever been burned on major upgrades, so this would be mainly for adding 9.x series.

dougwilson avatar Jan 15 '18 17:01 dougwilson

Ok, I ran the tests of everything against Node.js 9.4 and updated the Travis CI in master 👍

dougwilson avatar Jan 15 '18 17:01 dougwilson

Hm, not sure why the tests are suddenly failing on Travis. Works fine locally with 9.4 and worked fine on Travis before merging the master to resolve the conflicts.

maritz avatar Jan 15 '18 18:01 maritz

Yea, I agree looks like nothing changed. Perhaps the test itself just has a race condition and it just happened to fail this time? If rerunning the test it suddenly passes, probably a flaky test that would need to be debugged.

dougwilson avatar Jan 15 '18 18:01 dougwilson

Okay, now the tests passed. I'm hoping that these changes actually fixed it. Can't guarantee it though, since I don't truly understand what caused it in the first place.

Anything else I could do?

maritz avatar Jan 15 '18 20:01 maritz

@dougwilson Could you manually trigger the travis checks for this PR a couple of times to see if it still fails occasionally? Not a great thing to have to do, but better be safe, right?

maritz avatar Jan 17 '18 16:01 maritz

Yep, I can do that :+1: sorry for the delay, we're just trying to get out a couple security patches out asap. Jan seems to be the season of receiving security reports or something 😂

dougwilson avatar Jan 17 '18 16:01 dougwilson

Sorry to pester you, but is there any update on this?

maritz avatar Feb 16 '18 19:02 maritz

I'm very sorry, I totally forgot to follow up on your request to rerun the Travis CI job. I hit restart earlier today and it looks like it did indeed fail again as you suspected it may.

dougwilson avatar Feb 16 '18 22:02 dougwilson

There were changes in node v9.8.0 that have the potential to maybe fix this... That's about all I got for this right now. :-(

maritz avatar Mar 08 '18 16:03 maritz

I'm actually occasionally seeing this in the one app where I use this PR. I have no indication yet what causes it and it always takes the entire process down.

If anyone has any ideas on how to debug this, I'm very interested in it.

maritz avatar May 06 '18 13:05 maritz

I added a fix for a change in node 10.4 onward that caused test timeouts every time. This allowed me to add node v10.11 to the travis.yml targets, providing one more data point for the CI tests. I also added a bunch of error logging to the http2 tests.

However after several local runs and 10 forced CI runs I have not seen a single failure. There were plenty of changes in node v10 and even some in the v8 and v9 lines on http2, especially about closing connections and such. This might mean it is fixed. I haven't seen the problem in my production app in a while either (using node v10), but I may have disabled http2 for unrelated reasons at some point, I'll check that later tonight.

@dougwilson Since I added node v10 in the travis.yml, could you re-run the dependency checks again? Maybe making this part of the testchain/tooling would be a good idea?

edit: I also rebased on master. :-)

edit2: As suspected I had disabled compression on my production http2 app, because it was causing problems and my reverse proxy/load balancer is handling compression there. I'm re-enabling compression on the app now, just to test it a bit.

maritz avatar Sep 27 '18 10:09 maritz

Sadly this is still happening, had to revert to the proxy doing compression again.

Error [ERR_HTTP2_INVALID_STREAM]: The stream has been destroyed,
    at Http2ServerResponse.write (internal/http2/compat.js:599:19),
    at Gzip.onStreamData (/usr/src/app/node_modules/@maritz/compression/index.js:206:20),
    at Gzip.emit (events.js:182:13),
    at addChunk (_stream_readable.js:283:12),
    at readableAddChunk (_stream_readable.js:264:11),
    at Gzip.Readable.push (_stream_readable.js:219:10),
    at Gzip.Transform.push (_stream_transform.js:151:32),
    at Zlib.processCallback (zlib.js:591:10)

Not sure if it's actually the same error as we had before anymore, I can't remember. I'll maybe try to see how to reliably reproduce it and add a failing test. Otherwise this PR is still a no-go.

maritz avatar Sep 29 '18 11:09 maritz

Just to clarify: The code changes in this PR should be completely fine - they just change _implicitHeader to writeHead.

There is just some problem with how compression works over http2. If you want, I can make yet another PR for just this code change. I don't know if _implicitHeader on http1 is going away anytime soon, so there is probably no hurry.

maritz avatar Sep 29 '18 18:09 maritz

@maritz, @dougwilson: need any help with this PR? I'd definitely prefer using this over rolling our own compression support for HTTP2.

DullReferenceException avatar Mar 27 '19 21:03 DullReferenceException

If you can figure out what is causing the seemingly random failures, that would be great.

I don't have the time and honestly probably lack understanding of how compression and http2 work to fix this properly.

maritz avatar Mar 28 '19 06:03 maritz

@maritz: do you have any hints on what your app does that might be provoking those failures? So far, I've been unable to reproduce that failure. It may be that the version of node 10 is different than what you were using. I see that some fixes (such as https://github.com/nodejs/node/pull/23146) have been made to HTTP2 since you saw this happening.

DullReferenceException avatar Mar 28 '19 18:03 DullReferenceException

do you have any hints on what your app does that might be provoking those failures?

Not really, it was just failing after X amount of time working fine. It's a next.js server that also serves a few static files. So nothing too fancy.

But the Travis CI also had failures sometimes. So I really don't know how to approach this.

maritz avatar Mar 29 '19 18:03 maritz

I'll be doing a bunch of testing of your published fork within our app soon; maybe that will be sufficient validation. Will report back.

DullReferenceException avatar Mar 29 '19 18:03 DullReferenceException

With current commit, I could not reproduce the error @maritz reported. But I figured out 2 possible reasons that caused the error.

  • First possible reason (The error found with Travis-CI) Till this commit, the http2 test closes http2 stream when client get header frame ('response' is emmited when header frame reaches client). https://github.com/expressjs/compression/blob/521863cb435a9ce258fb6f477f031a7ed7e1cec3/test/compression.js#L325-L328 Since Zlib compression processes asynchronous, the client possibly gets the header frame and close the stream before sending a body. I surmised that @maritz occasionally saw the above issue and it has been solved. 100% reproduce this issue with the following code.
var server = createHttp2Server({ threshold: 0 }, function (req, res) {
    res.setHeader('Content-Type', 'text/plain')
    res.writeHead(res.statusCode)
    setImmediate(()=>{res.end()})
})

server.on('listening', function () {
    var client = createHttp2Client(server.address().port)
    var request = client.request({
        'Accept-Encoding': 'gzip'
    })
    request.on('response', function (headers) {
        closeHttp2(request, client, server, ()=>{})
    })
    request.end()
})
  • Second possible reason (The issue found with next.js) When the server returns 304(not modified) or 204(no content), the error possibly happens with Compression. When server returns statusCode 304 or 204, node http2 module starts to close stream. After that, seemingly call res.end() is safe. However, with compression, res is wrapped with Zlib stream and Zlib stream emit data even thought empty input (which is not a bug). So that, a server calls res.end() which is wrapped Zlib stream, eventually calls res.write(chunk) and it possibly causes the error.

    With http1, If already set statusCode as 304 or 204, http_outgoing_message.write(chunk) checks if it has body and [do nothing] (https://github.com/nodejs/node/blob/master/lib/_http_outgoing.js#L619-L624). With http2 compat layer, just throw an error.

    next.js serves static file with send package. Send package check if file is cached and return 304(not modified) and call res.end().

    100% reproduce this issue with the following code.

var server = createHttp2Server({ threshold: 0 }, function (req, res) {
    res.setHeader('Content-Type', 'text/plain')
    res.writeHead(304)
    setImmediate(()=>{res.end()})
})

server.on('listening', function () {
    var client = createHttp2Client(server.address().port)
    var request = client.request({
        'Accept-Encoding': 'gzip'
    })
    request.end()
})

Fixing the second issue, I think there are two approaches. A. check the status code in this package. B. check the status code in nodejs http2 compat layer. I prefer approch B, because of keeping compatibility with http1.

Any thoughts?

sogaani avatar Sep 04 '19 04:09 sogaani

do you have any ETA to get this merged?

wood1986 avatar Nov 22 '19 21:11 wood1986

I have re-enabled compression on my server again with a current node version (v12.14.1) and have performed some long-running load tests on a locally running next.js server serving html (200 status code) with >1m of compressed requests. I have also done a lot of manual loading of 200 and 304 requests with gzip compression on.

So far no crashes. I don't know if something changed in node or whether I've just been unlucky in reproducing it this time. I vote for merging and then dealing with potential issues if they arise.

Since all failures I've personally seen were with http2 anyways - not http1 - this would still be an improvement over the status quo of no one really using compression over http2 in the express-based world.

Otherwise, if anyone is interested in testing it without an official merge, you can use '@maritz/compression' from npm and report back how it works out for you.

maritz avatar Jan 19 '20 16:01 maritz

At the time, I was hoping that @sogaani 's comments would be answered prior to merging

Yeah, I'll try to see what I can do about that. I obviously can't really make a decision regarding whether to check for 304 in the module or in node compat. But I can do some more tests with old node versions etc. to see if the 100% reproducible cases can be reasonably added to theses tests.

wanted to note I do have a "rebase" label on this issue, as it is currently merge conflicts preventing merge

My bad, I saw that at some point and then forgot about it later on. :-D

Regarding the test reviews: Yep, those logs and no-ops are suboptimal. I think I added the logs simply for the case that we do see the unexpected failures and they could be caught with this. But it's been a long time, I'll go over all the tests and make sure they are done properly.

maritz avatar Jan 21 '20 10:01 maritz

Can someone update this or get another one open to achieve the same results? It's been a known issue for quite some time and prevents using http2 with server framework (spdy, http2, nextjs, etc).

Separately is there any reason to keep supporting older engines with serious security risks?

"engines": {
  "node": ">= 0.8.0"
}

Icehunter avatar Feb 11 '20 19:02 Icehunter