varnish-cache icon indicating copy to clipboard operation
varnish-cache copied to clipboard

503 from (probably) failed allocation after upgrade from 6.x to 7.7

Open XANi opened this issue 4 months ago • 4 comments

Since updating to 7.7.0(debian testing) from 6.x we are getting a spew of 503's that happens days after the cache is filled up

default.vcl.txt

/usr/sbin/varnishd -j unix,user=vcache -F -a 0.0.0.0:80 -T localhost:6082 -f /etc/varnish/default.vcl -l 81920k -p thread_pool_min=80 -p thread_pool_max=4000 -p thread_pools=2 -p default_ttl=1209600 -p nuke_limit=1000 -p max_restarts=4 -S /etc/varnish/secret -s default=malloc,8569M -s mm=malloc,17139M

logs look like that

*   << Request  >> 583226464 
-   Begin          req 583226327 rxreq
-   Timestamp      Start: 1761649198.288374 0.000000 0.000000
-   Timestamp      Req: 1761649198.288374 0.000000 0.000000
-   VCL_use        boot
-   ReqStart       10.0.0.1 64710 a0
-   ReqMethod      GET
-   ReqURL         /video/abcd.mp4
-   ReqProtocol    HTTP/1.1
-   ReqHeader      host: example.com
-   ReqHeader      sec-ch-ua-platform: ...
-   ReqHeader      accept-encoding: ...
-   ReqHeader      user-agent: ...
-   ReqHeader      accept: */*
-   ReqHeader      range: bytes=44325-15139796
-   ReqHeader      if-range: "063fccff745ae4b743f8053ba65c0f7d-2"
-   ReqHeader      priority: i
-   ReqHeader      x-forwarded-for: 1.2.3.4
-   ReqUnset       x-forwarded-for: 1.2.3.4
-   ReqHeader      X-Forwarded-For: 1.2.3.4, 10.0.1.1
-   ReqHeader      Via: 1.1 cache3 (Varnish/7.7)
-   VCL_call       RECV
-   ReqUnset       accept-encoding: identity;q=1, *;q=0
-   ReqURL         /video/abcd.mp4
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       MISS
-   VCL_return     fetch
-   Link           bereq 583226465 fetch
-   Timestamp      Fetch: 1761649198.310064 0.021689 0.021689
-   RespProtocol   HTTP/1.1
-   RespStatus     503
-   RespReason     Service Unavailable
-   RespHeader     Date: Tue, 28 Oct 2025 10:59:58 GMT
-   RespHeader     Server: Varnish
-   RespHeader     X-Varnish: 583226464
-   VCL_call       SYNTH
-   RespHeader     Content-Type: text/html; charset=utf-8
-   RespHeader     Retry-After: 5
-   VCL_return     deliver
-   Timestamp      Process: 1761649198.310122 0.021748 0.000058
-   RespHeader     Content-Length: 283
-   Storage        malloc Transient
-   Filters        
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1761649198.310186 0.021811 0.000063
-   ReqAcct        711 0 711 213 283 496
-   End       

and when requests started failing I noticed both rise of the allocation fails in the mm pool (which handled big objects), and drop in lru nukes

Image

that (so far, haven't run it enough to happen on other pool) only happens for big objects and only after few days of running after the pool have been full

Steps to Reproduce (for bugs)

No idea, we run it as front of S3 cache for files from small css to video

Varnish Cache version

7.7.0 (Debian testing)

Operating system

No response

XANi avatar Oct 28 '25 11:10 XANi

This has been discussed during bugwash. Some questions/notes:

There are two cases when c_fail of a malloc stevedore is increased:

We can not directly differentiate the two cases by the counter alone. The second case, however, would be root-caused in jemalloc not returning memory; and in many other places where we call malloc(), it is followed by an assertion. So if we were looking at a regression in jemalloc or integration thereof, you would very likely see panics, because in this case other allocation requests were also likely to fail.

Assuming that you do not see panics and rather only the 503s, the next probable cause if that the cache was filled and not enough room could be made until nuke_limit was reached.

It would be very helpful to get a backend varnishlog (-b or -g request option) of the failing backend request to clarify that this was indeed the case.

This behavior has not changed from 6.x to 7.7, at least not significantly and at least not that we would be aware. Hence we suspect that the reason why you are seeing these errors is not a change in the code base, but rather in the usage pattern.

That said, there are some things you could consider checking/changing/improving:

At the marked time on your graphs, the lru_nuked rate drops to zero. This probably means that there were no objects left to remove from cache, which in turn means that all objects were "held onto" by ongoing requests. This could simply mean that your cache is too small for your workload and/or objects too big, or it could mean that you are not "seeing" client disconnects early enough. In this context, consider lowering send_timeout and/or idle_send_timeout and configuring the tcp_keepalive_* parameters.

You are running in an "LRU only" configuration, with long TTLs and virtually no ttl-based expiry. The built-in storages have a fundamental design problem around nuke_limit, which is addressed by alternative storage engines, one of which is SLASH/. Its documentation has a section on the LRU fairness problem.

As a special case, note that the malloc storage engine supports "larger than ram" objects only for uncachable objects and, for most real use cases, only if transit_buffer is configured. So one specific cause of the issue you are seeing could be that you are trying to serve objects which are larger than (a significant portion of) your storage. As alternative options, file storage allows you to serve objects larger than RAM by using kernel-provided paging to disk/ssd and fellow from the aforementioned SLASH/ project achieves the same goal with a second LRU layer and also offers persistent storage.

nigoroll avatar Nov 03 '25 16:11 nigoroll

I'll try to look at the logs and try some of the proposed options after I get from vacation, thanks for your time :)

Objects are basically movie trailers so nothing remotely close to the size of the memory, and we divided RAM into 2 pools to avoid the "big object have to evict tens of thousands of small objects" that otherwise happened.

Assuming that you do not see panics and rather only the 503s, the next probable cause if that the cache was filled and not enough room could be made until nuke_limit was reached.

I did not see them at moment of writing, but since then I got another event when it did panic

err.log

~250 sessions running concurrently at that time but given the hitrate I'd expect most of them were on same bunch of latests objects.

XANi avatar Nov 04 '25 14:11 XANi

Regarding the original issue of this ticket: You could have run into the issue if concurrent requests delivered objects with a total size exceeding the memory.

The panic in the log is:

Nov  2 21:53:36 cache3 varnishd[419114]: Error: Child (868913) Panic at: Sun, 02 Nov 2025 20:53:36 GMT
Nov  2 21:53:36 cache3 varnishd[419114]: Assert error in VBF_Fetch(), cache/cache_fetch.c line 1245:
Nov  2 21:53:36 cache3 varnishd[419114]:   Condition(((oc->flags & OC_F_FAILED)) != 0) not true.

... which is unrelated and very likely the issue fixed with f03919325dc75d3cae8c80d66209d4f174469381

nigoroll avatar Nov 05 '25 14:11 nigoroll

@nigoroll hello, I have some questions about the cache eviction mechanism (LRU) in Varnish 7.5: Varnish Configuration Details (Varnish 7.5)

1.1 /usr/lib/systemd/system/varnish.service
[Unit]
Description=Varnish Cache, a high-performance HTTP accelerator
After=network-online.target nss-lookup.target

[Service]
Type=forking
KillMode=mixed
LimitNOFILE=131072
# ... (other OS limits)

ExecStart=/usr/sbin/varnishd \
          -a :60081 \
          -T :60082 \
          -S /etc/varnish/secret \
          -a localhost:8443,PROXY \
          -p thread_pools=4 \
          -p nuke_limit=5000 \
          -p shortlived=0.100 \
          -p thread_pool_min=100 \
          -p thread_pool_max=5000 \
          -p thread_pool_timeout=120 \
          -p tcp_keepalive_probes=3 \
          -p http_range_support=on \
          -f /etc/varnish/business.vcl \
          -P /run/varnish/varnishd.pid \
          -p feature=+http2 \
          -p vsl_space=1G \
          -s mem=malloc,20G \
          -s disk=file,/data1/data/services/varnish/storage.bin,50G

ExecReload=/usr/sbin/varnishreload

[Install]
WantedBy=multi-user.target
1.2 varnishadm param.show:
accept_filter                 -
acceptor_sleep_decay          0.9 (default)
acceptor_sleep_incr           0.000 [seconds] (default)
acceptor_sleep_max            0.050 [seconds] (default)
auto_restart                  on [bool] (default)
backend_idle_timeout          60.000 [seconds] (default)
backend_local_error_holddown  10.000 [seconds] (default)
backend_remote_error_holddown 0.250 [seconds] (default)
ban_cutoff                    0 [bans] (default)
ban_dups                      on [bool] (default)
ban_lurker_age                60.000 [seconds] (default)
ban_lurker_batch              1000 (default)
ban_lurker_holdoff            0.010 [seconds] (default)
ban_lurker_sleep              0.010 [seconds] (default)
between_bytes_timeout         60.000 [seconds] (default)
cc_command                    exec gcc -std=gnu11 -std=gnu11 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic -DZ_PREFIX -fno-var-tracking-assignments %w -pthread  -fpic -shared -Wl,-x -o %o %s (default)
cc_warnings                    -Wall -Werror -Wno-error=unused-result (default)
cli_limit                     64k [bytes] (default)
cli_timeout                   60.000 [seconds] (default)
clock_skew                    10 [seconds] (default)
clock_step                    1.000 [seconds] (default)
connect_timeout               3.500 [seconds] (default)
critbit_cooloff               180.000 [seconds] (default)
debug                         none (default)
default_grace                 10.000 [seconds] (default)
default_keep                  0.000 [seconds] (default)
default_ttl                   120.000 [seconds] (default)
experimental                  none (default)
feature                       none,+http2,+validate_headers,+vcl_req_reset
fetch_chunksize               16k [bytes] (default)
fetch_maxchunksize            0.25G [bytes] (default)
first_byte_timeout            60.000 [seconds] (default)
gzip_buffer                   32k [bytes] (default)
gzip_level                    6 (default)
gzip_memlevel                 8 (default)
h2_header_table_size          4k [bytes] (default)
h2_initial_window_size        65535b [bytes] (default)
h2_max_concurrent_streams     100 [streams] (default)
h2_max_frame_size             16k [bytes] (default)
h2_max_header_list_size       2147483647b [bytes] (default)
h2_rapid_reset                1.000 [seconds] (default)
h2_rapid_reset_limit          100 (default)
h2_rapid_reset_period         60.000 [seconds] (default)
h2_rx_window_increment        1M [bytes] (default)
h2_rx_window_low_water        10M [bytes] (default)
h2_rxbuf_storage              Transient (default)
h2_window_timeout             5.000 [seconds] (default)
http1_iovs                    64 [struct iovec (=16 bytes)] (default)
http_gzip_support             on [bool] (default)
http_max_hdr                  64 [header lines] (default)
http_range_support            on [bool] (default)
http_req_hdr_len              8k [bytes] (default)
http_req_size                 32k [bytes] (default)
http_resp_hdr_len             8k [bytes] (default)
http_resp_size                32k [bytes] (default)
idle_send_timeout             60.000 [seconds] (default)
listen_depth                  1024 [connections] (default)
lru_interval                  2.000 [seconds] (default)
max_esi_depth                 5 [levels] (default)
max_restarts                  4 [restarts] (default)
max_retries                   4 [retries] (default)
max_vcl                       100 (default)
max_vcl_handling              1 (default)
nuke_limit                    5000 [allocations]
pcre2_depth_limit             20 (default)
pcre2_jit_compilation         on [bool] (default)
pcre2_match_limit             10000 (default)
ping_interval                 3 [seconds] (default)
pipe_sess_max                 0 [connections] (default)
pipe_task_deadline            0.000 [seconds] (default)
pipe_timeout                  60.000 [seconds] (default)
pool_req                      10,100,10 (default)
pool_sess                     10,100,10 (default)
pool_vbo                      10,100,10 (default)
prefer_ipv6                   off [bool] (default)
rush_exponent                 3 [requests per request] (default)
send_timeout                  600.000 [seconds] (default)
shortlived                    0.100 [seconds]
sigsegv_handler               on [bool] (default)
startup_timeout               0.000 [seconds] (default)
syslog_cli_traffic            on [bool] (default)
tcp_fastopen                  off [bool] (default)
tcp_keepalive_intvl           75.000 [seconds] (default)
tcp_keepalive_probes          3 [probes]
tcp_keepalive_time            60.000 [seconds] (default)
thread_pool_add_delay         0.000 [seconds] (default)
thread_pool_destroy_delay     1.000 [seconds] (default)
thread_pool_fail_delay        0.200 [seconds] (default)
thread_pool_max               5000 [threads] (default)
thread_pool_min               100 [threads] (default)
thread_pool_reserve           0 [threads] (default)
thread_pool_stack             80k [bytes] (default)
thread_pool_timeout           120.000 [seconds]
thread_pool_watchdog          60.000 [seconds] (default)
thread_pools                  4 [pools]
thread_queue_limit            20 [requests] (default)
thread_stats_rate             10 [requests] (default)
timeout_idle                  5.000 [seconds] (default)
timeout_linger                0.050 [seconds] (default)
transit_buffer                0b [bytes] (default)
vary_notice                   10 [variants] (default)
vcc_feature                   none,+err_unref,+unsafe_path (default)
vcl_cooldown                  600.000 [seconds] (default)
vcl_path                      /etc/varnish:/usr/share/varnish/vcl (default)
vmod_path                     /usr/lib64/varnish/vmods (default)
vsl_buffer                    16k [bytes] (default)
vsl_mask                      all,-Debug,-ObjProtocol,-ObjStatus,-ObjReason,-ObjHeader,-ExpKill,-WorkThread,-Hash,-VfpAcct,-H2RxHdr,-H2RxBody,-H2TxHdr,-H2TxBody,-VdpAcct (default)
vsl_reclen                    255b [bytes] (default)
vsl_space                     1G [bytes]
vsm_free_cooldown             60.000 [seconds] (default)
workspace_backend             96k [bytes] (default)
workspace_client              96k [bytes] (default)
workspace_session             0.75k [bytes] (default)
workspace_thread              2k [bytes] (default)
1.3 caching.vcl
// 接收阶段缓存决策 (vcl_recv cache decision)
sub caching_recv_decision {
    // ... HTML/JSON pass
    // ... non-GET/HEAD pass
    // ... /api/ pass

    // Static assets allow hash (cache lookup)
    if (req.url ~ "\.(css|js|jpg|jpeg|gif|png|ico|webp|svg|woff|woff2|ttf|eot|mov|mp4|webm|mp3|ogg|pdf|zip)(\?.*)?$") {
        unset req.http.Cookie;
        return (hash); 
    }
    // Optimization 1: Dynamic content bypass (pass)
    return (pass);
}

// 后端响应缓存决策 (vcl_backend_response cache decision)
sub caching_backend_decision {
    // Static assets long TTL logic
    if (bereq.url ~ "\.(css|js|jpg|jpeg|gif|png|ico|webp|svg|woff|woff2|ttf|eot|mov|mp4|webm|mp3|ogg|pdf|zip)(\?.*)?$") {
        set beresp.ttl = 30d;
        // Optimization 2: 60MB disk split
        if (std.integer(beresp.http.Content-Length, 0) > 62914560) {
            set beresp.http.X-Cache-Storage = "disk";
            set beresp.storage = storage.disk; // Write to 50G disk
        } else {
            set beresp.http.X-Cache-Storage = "memory";
            set beresp.storage = storage.mem; // Write to 20G memory
        }
    }
    // Optimization 1 effect: Non-static content security check
    else {
        set beresp.uncacheable = true;
        return (deliver);
    }
    // ... (other error/size checks)
}

// Hash逻辑 (Hash Logic)
sub caching_hash_logic {
    // Optimization 3: Go-param caching for static files
    if (req.url ~ "\.(css|js|jpg|jpeg|gif|png|ico|webp|svg|woff|woff2|ttf|eot|mov|mp4|webm|mp3|ogg|pdf|zip)(\?.*)?$") {
        hash_data(regsub(req.url, "\?.*$", "")); 
    } else {
        // Dynamic content uses full URL
        hash_data(req.url); 
    }
    hash_data(req.http.host);
}

"My Varnish node frequently experiences memory filling up completely. After the memory becomes full, subsequent requests begin to fail, evidenced by the statistic $\mathbf{c_fail > 0}$ (allocation failures). This then triggers the LRU cleanup mechanism, but the process only evicts a small portion of objects at a time, which fails to solve the root problem, and the memory fills up again shortly after.My question is: Are there any parameters in Varnish 7.5 that can be set to trigger the LRU cleanup mechanism when memory utilization exceeds a specific threshold, for example, $\mathbf{80%}$? Alternatively, do you have any other optimization strategies that can fundamentally resolve this issue of the memory constantly filling up?"

waynebzhang avatar Dec 08 '25 13:12 waynebzhang

Re @waynebzhang Your report sounds similar to what has been discussed in this ticket and I have nothing to add to my previous note.

I will edit your post because it contains a lot of detail

nigoroll avatar Dec 11 '25 17:12 nigoroll