Add ignore_vary option
This adds a configuration option to completely ignore all Vary headers.
It is well known that squid does not fully implement Vary headers, in particular with shared memory caching as mentioned in bug 4445 comment 3. We are running into this now that we have our larger installations using SMP. We control our servers and clients, and the servers never send Vary and the clients never vary their request types, but sometimes we insert a Cloudflare CDN reverse proxy between our forward proxy squid and server, where we have little control, and those insert Vary: accept-encoding. That causes those responses to always be treated as stale by squid after their first expiration. I have not been able to find another way to avoid that besides this option.
To reproduce, set http_proxy to point to a squid configured for SMP and run
curl -vs http://s1cern-cvmfs.openhtc.io/cvmfs/info/v1/meta.json | wc -c
After it expires in 61 seconds, run it again a few times. Without this option the Cache-Status always includes stale. With this option, it is stale once, then miss, then hit. I don't know why there's an intermediate miss, but it's not nearly as big of a problem for us to have that one extra miss as to have every request be stale.
As an aside: we usually use collapsed_forwarding, but this problem happens with or without collapsed_forwarding. Our server does not send Last-Modified, but Cloudflare inserts that too. I know that SMP does not support If-Modified-Since with collapsed_forwarding (bug #4890), but we work around that by prohibiting sending any If-Modified-Since requests with
request_header_access If-Modified-Since deny all
Cannot create a git commit message from PR title and description.
Error while parsing PR description body: the line is too long 73>72
Problematic parser input:
This adds a configuration option to completely ignore all `Vary` headers.
Please see PR title and description formatting requirements for more details.
This message was added by Anubis bot. Anubis will add a new message if the error text changes. Anubis will remove M-failed-description label when there are no corresponding failures to report.
I have not seen Amos' review when posting mine. FWIW, the alternative solution I have proposed would effectively work around concerns in Amos' review -- those concerns would apply to a particular Squid configuration instead (and addressing them would be up to the admin deploying that configuration, of course -- Squid Project reviewers would not stand in the way).
I would like to hear others feedback regarding the alternative solution above before deciding whether to block the addition of a [fixed]
ignore_varyimplementation (and ask you to addincoming_reply_header_accessor a similar general directive instead). CC: @kinkie, @yadij
I outlined in my review a few implementation requirements we need not to break HTTP quite badly. IMHO though it would be better spending time to fix the Vary related bugs (4445, 4525, and 4890) that cause the unwanted behaviour.
it would be better spending time to fix the Vary related bugs (4445, 4525, and 4890) that cause the unwanted behaviour.
We are in agreement on that, but it may take a very long time and requires special expertise... The alternative incoming_reply_header_access directive is generally useful and does not have those problems.
The alternative
incoming_reply_header_accessdirective is generally useful and does not have those problems.
AIUI, this alternative meets the ACL-driven requirement. "generally useful" is a bad thing - the protocol impact is expanded from just Vary to all possible content negotiation headers, including future/unknown ones. Making the important request-filtering action highly difficult to implement correctly.
Well! I didn't expect this to be accepted as-is, but I also didn't expect such vigorous responses. I was hoping for some small hints to make it better, but it looks like you're both asking for different things and both of them are very complex and time consuming. I will probably instead just leave this applied as a patch on our project's Squid distribution until the Vary behavior in Squid improves, if it does.
We all agree that the better thing would be to fix the Vary bug(s), but as @rousskov said, that requires specialized knowledge, and I don't have that. My applications also don't need Vary at all so I can't justify spending significant resources on it.
@yadij I'm not surprised that Cloudflare behaves as it's supposed to with varied request headers, but I know that our applications don't send any so I'm not worried about that. I will indeed include request_header_access Accept-Encoding deny all to make sure nobody sends that just in case, thanks for that hint. One of our two major applications streams data from a database, so it is impossible for it to calculate an ETag in time to insert it as a header.
@rousskov I was wishing for something like an incoming_reply_header_access feature and I definitely would have used it if it were available, but I didn't know if it would be desired so I started with something much simpler. It sounds like @yadij doesn't want it though. If he did I could possibly look into seeing what it would take to implement that based on existing code and a few hints such as where to hook in.
Well! I didn't expect this to be accepted as-is, but I also didn't expect such vigorous responses. I was hoping for some small hints to make it better, but it looks like you're both asking for different things and both of them are very complex and time consuming.
FWIW, I am not asking for very complex or very time consuming things.
@rousskov I was wishing for something like an
incoming_reply_header_accessfeature and I definitely would have used it if it were available, but I didn't know if it would be desired so I started with something much simpler.
I understand. FWIW, when considering adding new features, I recommend asking first.
It sounds like @yadij doesn't want it though. If he did I could possibly look into seeing what it would take to implement that based on existing code and a few hints such as where to hook in.
I disagree with Amos "generally useful is a bad thing" opinion: IMO, correct/good applications justify the inevitable abuses in incoming_reply_header_access case. However, this disagreement is of secondary importance here. What really matters is the answer to the following question:
@yadij, do you expect to veto a quality PR that adds a general filtering directive like incoming_reply_header_access [on the grounds that such a directive can be misused]?
when considering adding new features, I recommend asking first.
I thought about that but I knew it was a known problem, and I needed assurance that I could find a workaround. Once I found one I thought it would be best to propose it and see what came of it.