Stale negative responses returned under unknown conditions
I have been seeing stale negative getting stuck in cache on ATS 10.0.2 and 10.0.3, but have been unable to reproduce this under controlled conditions. The problem has only been seen under relatively high loads, but it might be that the issue occurs more often if the response is cached due to an internal request (from a Lua fetch call). Whatever the cause, the cached negative response is returned for all requests. Usually the cached response disappears after restarting ATS.
Sample response:
< HTTP/1.1 400 Bad Request
< Cache-Control: public; max-age=30
< Date: Mon, 10 Feb 2025 11:40:36 GMT
< Content-Length: 0
< Age: 866
This is a relatively short-lived instance, sometimes the times go up to tens of hours for such response.
The negative caching here is prompted only by the Cache-Control header. There are no cache.config pins for the origin so this shouldn't be https://github.com/apache/trafficserver/issues/11854 and negative_caching_enabled is set to 0. The above response was from a forward-proxied request with no remap entry, but the problem also occurs for remapped reverse-proxy requests.
I have unfortunately not been able to gather debug logs for this, since it has only been seen on busy servers and seems to depend on in-memory state.
I just noticed that the service suffering from the problem was returning a Cache-Control with an invalid separator (; instead of ,). With a synthetic test, I can reproduce the error-is-cached-forever behavior when sending a similar malformed C-C.
The fact that this C-C gets parsed to mean "cache this forever" seem like an unexpected result?
Thank you for your observations. It would be good to add autests for this. @cmcfarlen or I intend to do this at some point.