squid
squid copied to clipboard
add refresh_pattern option 'ignore-must-revalidate' back in
This is a partial revert of commit 064679ea374d2f58ae4660d9a2af213b9be24bba, which removed the "ignore-must-revalidate" option. The option was not actually broken, but it is not about the storage of objects - it is about whether they need to be retrieved again for new requests.
For example, I encountered a university download service that has set both "Expires: 0" and "Cache-Control: must-revalidate" for unchanging content. The refresh_pattern "override-expires" option will allow you to store a cache of this content, but due to the "Cache-Control: must-revalidate" header the proxy server will always re-fetch the content because the server has effectively indicated (incorrectly) that it is a live data feed. By re-enabling the "ignore-must-revalidate" refresh_pattern option it is possible to override the behavior for the download service's specific URL, such that the proxy can properly cache the content and not have to constantly re-fetch a bunch of multi-gigabyte files.
Can one of the admins verify this patch?
ok to test
Fixed that the previous push was missing the #if USE_HTTP_VIOLATIONS protection in src/refresh.cc. I'm not sure what the M-failed-description tag means, presumably some sort of formatting issue?
I'm not sure what the M-failed-description tag means, presumably some sort of formatting issue?
Yes, probably. Please see Anubis docs about pull request labels (linked from each Anubis label but GitHub makes it difficult to see/use those links). I would not worry about that formatting for now, but that is your call. FWIW, the PR title/description will become the official commit message when your PR merges; any decent editor will let you quickly reformat PR description text to follow typical Git commit message guidelines (documented and enforced by Anubis).
Alex has pointed out most of the major issues already. I've just added more to the pile.
The "ignore-must-revalidate" option. The option was not actually broken, but it is not about the storage of objects - it is about whether they need to be retrieved again for new requests.
I think you have misunderstood the original removal. Unless the
Cache-Control:no-storeis present an object can always be stored. The option was removed because the useful part of its previous implementation are now the default behaviour of HTTP/1.1 Squid and all that remained was the security bugs and broken servers.
I was attempting to have the feature behave the same way it did before, but it sounds like it would do so even without the !REFRESH_OVERRIDE(ignore_must_revalidate) bit.
For example, I encountered a university download service that has set both "Expires: 0" and "Cache-Control: must-revalidate" for unchanging content.
[FWIW; have you attempted to contact that one broken service and inform them of the problem? it is often more useful to get the few broken servers fixed than to open the whole Internet up to security vulnerabilities just to let them stay broken.]
The "contact webmaster" link is broken, so we'll have to do a little digging to figure out who to talk to. We meet with the science group associated with the site on occasion, so they may be able to point us in the right direction. Is there a way that broken sites can be addressed that doesn't cause you concern about security vulnerabilities? Would moving the location of the fix to wherever Cache-Control:must-revalidate sets staleness=0 address your concern (your other comment)?
The refresh_pattern "override-expires" option will allow you to store a cache of this content,
No. The mere existence of HTTP/1.1
Cache-Controlheader makes the HTTP/1.0Expiresheader be ignored by default.So the
Cache-Control:must-revalidatealone is determining what happens. Not your refresh_pattern.
I apologize, the store-stale option was also set and I misunderstood which option was responsible for caching of the file. With just the store-stale option set the server caches the file, but the proxy server always returns TCP_REFRESH_MODIFIED for additional requests without changing the behavior of the revalidation.
but due to the "Cache-Control: must-revalidate" header the proxy server will always re-fetch
What
must-revalidatedoes is tell Squid the object should be considered stale when it arrived and that a short/fast revalidate is all that is needed to re-use it. Proxy will always fetch something for stale content, period.The real problem is that example server responding to revalidate requests by delivering the whole object in full even when it has not changed.
Yes, it would be great if the server did "the right thing" such that this was not necessary.
@compholio are you intending to continue with this PR? if not I think we should close the PR to help clear up the review queue.
It sounds like there's not a way to make folks happy, so I'll say it's probably best to drop the request.