ledge stale content not being updated

I have an upstream that has a short lived max-age and slightly longer stale headers, that once I get past that time, content isn't refreshed.

The cache headers from the upstream are:

< Cache-Control: max-age=0, s-maxage=15, stale-while-revalidate=60, stale-if-error=900, private

I can see ledge triggering a background refresh, but when it tries to fetch the new content, it returns the stale content again (no matter how long I wait).

A dumb fix is attached.

stale.patch

Mar 14 '23 04:03 csmithATsquiz

I can see ledge triggering a background refresh, but when it tries to fetch the new content, it returns the stale content again (no matter how long I wait).

By "returns the stale content" do you mean that Ledge continues to serve the stale content, even though the background task correctly fetched fresh content? Or that the newly refreshed content somehow appears to be the old stale version?

Is the origin another Ledge instance by any chance? Or perhaps another system capable of serving stale-on-error?

Ledge won't serve stale content if it has fresh content. So there should be no need to override the can_serve_stale behaviour, assuming it is able to get fresh content in the background. So either the background revalidation is failing somehow, or perhaps your origin is failing in some way and returning "stale-on-error" content itself?

Mar 14 '23 08:03 pintsized

The background task returns the stale content from ledge that it already had, it never fetches from the origin.

Mar 14 '23 22:03 csmithATsquiz

If I apply this patch instead (so I can log the state and event history, but keep the extra X-Revalidate header), I get this output:

2023/03/14 22:44:32 [error] 18833#0: *36954658 [lua] handler.lua:557: revalidation_data(): in revalidation_data:: revalidation params  : {"connect_timeout":5000,"read_timeout":900000,"server_addr":"my.own.ip","server_port":"443","send_timeout":2000,"uri":"\/my\/url","scheme":"https","keepalive_timeout":75000,"keepalive_poolsize":64}, client: x.x.x.x, server: squizedge.net, request: "GET /my/url HTTP/1.1", host: "my.host.name"
2023/03/14 22:44:32 [error] 18833#0: *36954658 [lua] handler.lua:558: revalidation_data(): in revalidation_data:: revalidation headers : {"host":"my.host.name","X-Revalidate":true}, client: x.x.x.x, server: squizedge.net, request: "GET /my/url HTTP/1.1", host: "my.host.name"
2023/03/14 22:44:32 [error] 18833#0: *36954658 [lua] handler.lua:943: serve(): in serve:: state_history:{"checking_origin_mode":true,"checking_can_serve_stale":true,"considering_gzip_inflate":true,"preparing_response":true,"checking_method":true,"considering_esi_process":true,"serving":true,"checking_cache":true,"checking_request":true}, client: x.x.x.x, server: squizedge.net, request: "GET /my/url HTTP/1.1", host: "my.host.name"
2023/03/14 22:44:32 [error] 18833#0: *36954658 [lua] handler.lua:944: serve(): in serve:: event_history:{"cache_accepted":true,"init":true,"response_ready":true,"cacheable_method":true,"cache_expired":true,"gzip_inflate_disabled":true,"can_serve_stale_while_revalidate":true,"esi_process_not_required":true}, client: x.x.x.x, server: squizedge.net, request: "GET /my/url HTTP/1.1", host: "my.host.name"
x.x.x.x - [14/Mar/2023:22:44:32 +0000] "GET https://my.host.name/my/url HTTP/1.1" 200 695 "-" "curl/7.86.0" [0.017] HIT from staging.squizedge.net [AU -33.4940,143.2104]
2023/03/14 22:44:32 [error] 18832#0: *36954674 [lua] handler.lua:557: revalidation_data(): in revalidation_data:: revalidation params  : {"connect_timeout":5000,"read_timeout":900000,"server_addr":"202.9.94.178","server_port":"443","send_timeout":2000,"uri":"\/my\/url","scheme":"https","keepalive_timeout":75000,"keepalive_poolsize":64}, client: x.x.x.x, server: squizedge.net, request: "GET /my/url HTTP/1.1", host: "my.host.name"
2023/03/14 22:44:32 [error] 18832#0: *36954674 [lua] handler.lua:558: revalidation_data(): in revalidation_data:: revalidation headers : {"host":"my.host.name","X-Revalidate":true}, client: x.x.x.x, server: squizedge.net, request: "GET /my/url HTTP/1.1", host: "my.host.name"
2023/03/14 22:44:32 [error] 18832#0: *36954674 [lua] handler.lua:943: serve(): in serve:: state_history:{"checking_origin_mode":true,"checking_can_serve_stale":true,"considering_gzip_inflate":true,"preparing_response":true,"checking_method":true,"considering_esi_process":true,"serving":true,"checking_cache":true,"checking_request":true}, client: x.x.x.x, server: squizedge.net, request: "GET /my/url HTTP/1.1", host: "my.host.name"
2023/03/14 22:44:32 [error] 18832#0: *36954674 [lua] handler.lua:944: serve(): in serve:: event_history:{"cache_accepted":true,"init":true,"response_ready":true,"cacheable_method":true,"cache_expired":true,"gzip_inflate_disabled":true,"can_serve_stale_while_revalidate":true,"esi_process_not_required":true}, client: x.x.x.x, server: squizedge.net, request: "GET /my/url HTTP/1.1", host: "my.host.name"
x.x.x.x - [14/Mar/2023:22:44:32 +0000] "GET https://my.host.name/my/url HTTP/1.1" 200 695 "-" "lua-resty-http/0.14 (Lua) ngx_lua/10015 ledge_revalidate/2.3.0" [0.015] HIT from staging.squizedge.net [AU -33.4940,143.2104]

debug.patch

so I get the original request, and then the revalidation attempt also gets a 'HIT' back from ledge, which is my problem.

Mar 14 '23 22:03 csmithATsquiz

I see. You shouldn't need to be doing this with custom headers of course. The semantics are there already; the background request should include max-stale=0 in its request headers, and this should prevent ledge from serving stale, causing an upstream fetch. Your approach will indeed force background revalidation, regardless of whether its necessary or not, but the HTTP semantics should be doing this for you already.

Hmm, which is an interesting point. That is, I think you could see the behaviour you are describing if the content has been refreshed by some other request in the mean time. The background task sets max-age=0 in its request headers, this is compared against remaining_ttl which is assumed to be negative (i.e. expired). If it's not expired, it will just serve the valid resource to the background task. Any chance that's what you're seeing?

At the risk of hand-waving, there are a lot of tests for this behaviour. It's been so many years since I've thought about it though, that it's hard for me to be certain about much.

If it still looks like a bug to you, I'd start by trying to recreate the conditions in the test harness (see the existing integration tests for examples: https://github.com/ledgetech/ledge/blob/master/t/02-integration/stale-while-revalidate.t). I'd take the request and response headers from your real world scenario and plug them into a similar test so that you can inspect the results in isolation. Take note of the wait directives in those tests, which allow the background tasks to complete before moving onto the next test.

In summary though; I agree you should not be seeing the behaviour that you appear to be. It was designed to do what you expect, using HTTP semantics to ensure background tasks never get stale content. They will get fresh content if some other thread has already refreshed the content, which is correct (i.e. harmless) and helps protect the origin.

Mar 15 '23 09:03 pintsized

ledge ledge copied to clipboard

stale content not being updated

ledge
ledge copied to clipboard