nchan icon indicating copy to clipboard operation
nchan copied to clipboard

Message ID invalid when If-Modified-Since header exists

Open qhenkart opened this issue 6 years ago • 10 comments

In the past few weeks I have started seeing an increasing number of subscription messages fail to 400. Sometimes the 400 is empty, other times I get a Message ID invalid error. I've upgraded to the latest release of nchan and using openresty 10.13.6.2. I've simplified the subscription requests to try to narrow down what is wrong. If I omit the If-Modified-Since header, there is no issue. However the presence of this header causes the 400

I am using long-polling and your official js package which automatically sets the header here https://github.com/slact/nchan.js/blob/master/NchanSubscriber.js#L844

A simplified subscription config is here

  location ~* "^/sub/<some regex>" {
    nchan_subscriber;
    nchan_subscriber_first_message newest;
    nchan_subscriber_timeout 0;
    nchan_channel_id $1 profile_$2
    nchan_redis_pass redisdb;
  }

any ideas on what would cause a 400 with an invalid message ID error?

Here is a redacted example of a --verbose cURL request

curl '<redacted>' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept-Language: en-US,en;q=0.9' -H 'Accept: */*' -H 'Cookie: sid=<redacted>' -H 'Connection: keep-alive' -H 'If-Modified-Since: Wed, 28 Nov 2018 07:00:08 GMT' -H 'Cache-Control: max-age=0' --verbose GET /sub/<redacted> HTTP/1.1
> Host: <redacted>
> User-Agent: curl/7.54.0
> Accept-Encoding: gzip, deflate, br
> Accept-Language: en-US,en;q=0.9
> Accept: */*
> Cookie: sid=<redacted>
> Connection: keep-alive
> If-Modified-Since: Wed, 28 Nov 2018 07:00:08 GMT
> Cache-Control: max-age=0
>
< HTTP/1.1 400 Bad Request
< Content-Type: text/plain
< Date: Wed, 28 Nov 2018 07:24:30 GMT
< Server: nginx/1.15.5
< x-channel-id: <redacted>
< x-profile-id: <redacted>
< Content-Length: 18
< Connection: keep-alive
<
* Connection #0 to host <redacted> left intact
Message ID invalid

interestingly enough, I cannot duplicate this error on my local environment

qhenkart avatar Nov 28 '18 07:11 qhenkart

I was able to resolve this using https://github.com/slact/nchan/issues/126 and the nchan_subscriber_compound_etag_message_id option. I'm still unsure why the default message-since strategy was not working.

qhenkart avatar Nov 28 '18 08:11 qhenkart

going to investigate this.

slact avatar Dec 01 '18 23:12 slact

@slact now even with etags, I am getting the same issue because the request includes the If-None-Match header of -1:0

qhenkart avatar Dec 14 '18 04:12 qhenkart

I dug into the code a bit more. It seems my first subscription is failling with a 410, the response of that failure sets the etag to -1:0 which in turn creates 400 bad requests when trying to reconnect. It is possible a similar thing was occuring for the original issue, where a 410 response sets a bad response header.

I am not sure why the 410 is happening

qhenkart avatar Dec 14 '18 05:12 qhenkart

investigating further. I am more confident that all of this is related. A 410 seems to return a bad value for both last-modified and etag headers which gets subsequent requests to hit 400.

the cause of the 410 is perplexing as well. The first request is barebones, it has just a simple empty request and does not include the cookies or headers , however the second request does and makes it all the way through as expected (but with a 400 because the etag is already corrupt. This is the cURL request for the 410 curl 'http://redacted.com/sub/someID' -H 'Referer: http://redacted.com/someID' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36' --compressed

which is completely missing the cookies

qhenkart avatar Dec 14 '18 05:12 qhenkart

I realize this is a caching issue. The 410 occurs from some intermittent error, it then gets cached forever and ensures that no successful subscription can be made since the headers are corrupt

qhenkart avatar Jan 20 '19 07:01 qhenkart

I was finally able to resolve the issue using the transportSetup event and adding a no cache header

qhenkart avatar Jan 20 '19 08:01 qhenkart

additionally, to stop fresh 410s from destroying nchan until a user refreshes. I'm now checking the If-None_Match is a negative value and removing it if it is. This has no fully resolved my issues

lua script

  if headers["If-None-Match"] == "-1:0" then
    ngx.req.clear_header("If-None-Match")
  end

qhenkart avatar Jan 24 '19 04:01 qhenkart

@slact circling back here since my workarounds of course led to issues later on (although kept things afloat for a few months). I have a bit better understanding of how nchan works now and it allowed me to debug more efficiently. All of the issues described in this post are simply the result of the Etag being stripped by gzip . (looks like this was fixed previously https://github.com/slact/nchan/issues/92).

However unlike the previous fix which was due to nginx gzip, our situation was due to Cloudfront gzip which also strips the Etag and has a lot less granularity since it ignores any gzip settings in nginx aws documentation mentioning etag

After a lot of playing around with various solutions. I finally found a way forward from this line in the aws documentation, listing the requirements that allow gzipping The response must not include a Content-Encoding header.

Adding a add_header Content-Encoding 'identity' to the subscription location in nginx solved the issue and I'm finally back to a 100% stable implementation.

I feel this set up (running nginx and AWS CloudFront with gzip) is a common enough design pattern that this should either be in the documentation or perhaps Content-Encoding should be set to identity by default for all subscription requests

qhenkart avatar Apr 17 '19 12:04 qhenkart

Yep, that's quite useful indeed. I'll be adding this to the documentation. (It might have to wait until the next release or two though as I need to reformat the docs a bit, but I'll get this in there for sure.) Thanks for your research!

slact avatar Apr 18 '19 17:04 slact