varnish-cache icon indicating copy to clipboard operation
varnish-cache copied to clipboard

File Backend: 503 or Stream closed, "Could not get storage"

Open MarcWort opened this issue 11 months ago • 14 comments

Expected Behavior

With do_stream nuke old files and keep streaming or without do_stream nuke old files on response and start sending to client.

Current Behavior

I run a pretty normal setup serving static (video) file ~3GB each, from a file storage backend. If the storage is full, requests gets either a 503 response when setting do_stream = false or unexpected connection closed with do_stream = true. In the logs I see

-   Storage        file s0
-   Fetch_Body     3 length stream
-   FetchError     Could not get storage

Possible Solution

No response

Steps to Reproduce (for bugs)

  1. preallocate cache file fallocate -l 5G /var/cache/varnish/video
  2. start varnishd -f video.vcl -i video -p timeout_idle=300 -p listen_depth=5000000 -p thread_pool_min=500 -s file,/var/cache/varnish/video -p default_ttl=600 -p default_grace=172800 -p nuke_limit=999999 -p send_timeout=7200
  3. curl --limit-rate 1000k -o /dev/null http://localhost/video1.mp4 (3GB works fine)
  4. curl --limit-rate 1000k -o /dev/null http://localhost/video2.mp4 (3GB, connection closed or immediate 503, depending on do_stream)

Context

Varnish Cache version

varnishd (varnish-7.5.0 revision eef25264e5ca5f96a77129308edb83ccf84cb1b1)

Operating system

Debian 12

Source of binary packages used (if any)

https://packagecloud.io/varnishcache/varnish75/packages/debian/bookworm/varnish_7.5.0-1~bookworm_amd64.deb?distro_version_id=215

MarcWort avatar Jan 24 '25 15:01 MarcWort

It seems like you are trying to deliver 2 files of 3gb size each from a 5gb file storage? The file storage does not support this scenario, because it always keeps entire files in cache, and I am not aware of any storage engine for Varnish-Cache which would support this kind of partial caching. SLASH/fellow supports partial memory caching[^1], but even it requires enough disk to accommodate all concurrently used objects. We will improve the documentation in this regard.

[^1]: which file also implements implicitly, because it uses virtual memory

nigoroll avatar Jan 27 '25 15:01 nigoroll

Hey @nigoroll,

shouldn't the older object being nuked? We also tried malloc and it appeared to behave the same way as the file storage, at least in a test case.

idl0r avatar Jan 27 '25 15:01 idl0r

@idl0r yes, the older object should get nuked, but that can only happen once it is not used any more. This is what I was referring to with "it seems like...", it is not clear to me if you are accessing the two objects simultaneously or in sequence. If in sequence, I agree that it should work.

nigoroll avatar Jan 27 '25 15:01 nigoroll

In our test cases it was in sequence. Tested with e.g. curl and some seconds delay between those requests. I wasn't being nuked but got the 503 instead.

idl0r avatar Jan 27 '25 15:01 idl0r

We can also share a varnishlog log (partly redacted) or config, better off of github so we could share it almost as it is, if you want.

idl0r avatar Jan 27 '25 15:01 idl0r

I tried to reproduce this and failed.

  • apache listening on 172.0.0.1 and delivering two 3g files (hardlinked)
$ curl -I http://localhost:80/3g
HTTP/1.1 200 OK
Date: Mon, 27 Jan 2025 15:34:35 GMT
Server: Apache/2.4.62 (Debian)
Last-Modified: Mon, 27 Jan 2025 15:29:49 GMT
ETag: "c0000000-62cb1bf362f9a"
Accept-Ranges: bytes
Content-Length: 3221225472

$ curl -I http://localhost:80/3g_2
HTTP/1.1 200 OK
Date: Mon, 27 Jan 2025 15:34:37 GMT
Server: Apache/2.4.62 (Debian)
Last-Modified: Mon, 27 Jan 2025 15:29:49 GMT
ETag: "c0000000-62cb1bf362f9a"
Accept-Ranges: bytes
Content-Length: 3221225472

  • varnish configured with 5g storage file
$ /tmp/sbin/varnishd -s file,/home/slink/tmp/stv_file  -n /tmp/t -b 127.0.0.1:80 -a 127.0.0.1:8081
Child launched OK
$ ls -hals /home/slink/tmp/stv_file
5,1G -rw-r--r-- 1 slink slink 5,0G Jan 27 16:32 /home/slink/tmp/stv_file
$ varnishstat -1f SMF.*
SMF.s0.c_req                 0         0.00 Allocator requests
SMF.s0.c_fail                0         0.00 Allocator failures
SMF.s0.c_bytes               0         0.00 Bytes allocated
SMF.s0.c_freed               0         0.00 Bytes freed
SMF.s0.g_alloc               0          .   Allocations outstanding
SMF.s0.g_bytes               0          .   Bytes outstanding
SMF.s0.g_space      5368709120          .   Bytes available
SMF.s0.g_smf                 8          .   N struct smf
SMF.s0.g_smf_frag            0          .   N small free smf
SMF.s0.g_smf_large            8          .   N large free smf
  • works
$ curl -svo t http://localhost:8081/3g
*   Trying 127.0.0.1:8081...
* Connected to localhost (127.0.0.1) port 8081 (#0)
> GET /3g HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Mon, 27 Jan 2025 15:37:00 GMT
< Server: Apache/2.4.62 (Debian)
< Last-Modified: Mon, 27 Jan 2025 15:29:49 GMT
< ETag: "c0000000-62cb1bf362f9a"
< Content-Length: 3221225472
< X-Varnish: 2
< Age: 0
< Via: 1.1 haggis21 (Varnish/trunk)
< Accept-Ranges: bytes
< Connection: keep-alive
< 
{ [32768 bytes data]
* Connection #0 to host localhost left intact
$ ls -hals t
512 -rw-r--r-- 1 slink slink 3,0G Jan 27 16:37 t
$ varnishstat -1f SMF.*
SMF.s0.c_req                13         0.05 Allocator requests
SMF.s0.c_fail                0         0.00 Allocator failures
SMF.s0.c_bytes      3221356544  12732634.56 Bytes allocated
SMF.s0.c_freed               0         0.00 Bytes freed
SMF.s0.g_alloc              13          .   Allocations outstanding
SMF.s0.g_bytes      3221356544          .   Bytes outstanding
SMF.s0.g_space      2147352576          .   Bytes available
SMF.s0.g_smf                21          .   N struct smf
SMF.s0.g_smf_frag            0          .   N small free smf
SMF.s0.g_smf_large            8          .   N large free smf
$ curl -svo t http://localhost:8081/3g_2
*   Trying 127.0.0.1:8081...
* Connected to localhost (127.0.0.1) port 8081 (#0)
> GET /3g_2 HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Mon, 27 Jan 2025 15:37:21 GMT
< Server: Apache/2.4.62 (Debian)
< Last-Modified: Mon, 27 Jan 2025 15:29:49 GMT
< ETag: "c0000000-62cb1bf362f9a"
< Content-Length: 3221225472
< X-Varnish: 5
< Age: 0
< Via: 1.1 haggis21 (Varnish/trunk)
< Accept-Ranges: bytes
< Connection: keep-alive
< 
{ [32768 bytes data]
* Connection #0 to host localhost left intact
$ ls -hals t
512 -rw-r--r-- 1 slink slink 3,0G Jan 27 16:37 t
$ varnishstat -1f SMF.*
SMF.s0.c_req               115         0.43 Allocator requests
SMF.s0.c_fail               76         0.28 Allocator failures
SMF.s0.c_bytes      6442713088  23861900.33 Bytes allocated
SMF.s0.c_freed      3221356544  11930950.16 Bytes freed
SMF.s0.g_alloc              26          .   Allocations outstanding
SMF.s0.g_bytes      3221356544          .   Bytes outstanding
SMF.s0.g_space      2147352576          .   Bytes available
SMF.s0.g_smf                31          .   N struct smf
SMF.s0.g_smf_frag            1          .   N small free smf
SMF.s0.g_smf_large            4          .   N large free smf

nigoroll avatar Jan 27 '25 15:01 nigoroll

@nigoroll so I was looking into this again, because I finally had some time and it still happens. I gave Varnish 4G of malloc storage, did some initial downloads/tests and all fine. This time I even used only one video, about 2.7G of size. When I started today, while being idle vver night, for whatever reason, some chunks? seem to have been expired/nuked or so and then I noticed that partial content request again, which failed. There were no parallel requests, esp. on that test video file. A restart immediately fixed it.

I then reduced the TTL for my test file to like 2m. Did 2 downloads, one to do the backend fetch, another to make sure it's served from the cache and waited like 5m or so and tried again. I could successfully reproduce it. So it looks like Varnish is doing that partial stuff, not curl / wget. I tested both.

So 2.7G video / test file. 4G malloc. The file wouldn't fit twice but plenty of space for that single purpose / test. No parallel requests so far. It looks like it is also related to the TTL.

*   << BeReq    >> 32771     
-   Begin          bereq 32770 fetch
-   VCL_use        boot
-   Timestamp      Start: 1762961441.904593 0.000000 0.000000
-   BereqMethod    GET
-   BereqURL       /testfile.mp4
-   BereqProtocol  HTTP/1.1
-   BereqHeader    host: example.com
-   BereqHeader    user-agent: Wget/1.21.3
-   BereqHeader    accept: */*
-   BereqHeader    https: true
-   BereqHeader    x-forwarded-proto: https
-   BereqHeader    customer-forwarded-proto: https
-   BereqHeader    x-forwarded-port: 443
-   BereqHeader    x-example-fe_name: somefrontend
-   BereqHeader    x-example-fe_dst: 1.2.3.4
-   BereqHeader    Via: 1.1 somecust_prod (Varnish/7.7)
-   BereqHeader    X-Forwarded-For: 1.2.3.4
-   BereqHeader    Accept-Encoding: gzip
-   BereqHeader    If-Modified-Since: Wed, 22 Jan 2025 13:03:40 GMT
-   BereqHeader    If-None-Match: "af7dbd2d-62c4b1f540de4"
-   BereqHeader    X-Varnish: 32771
-   VCL_call       BACKEND_FETCH
-   VCL_return     fetch
-   Timestamp      Fetch: 1762961441.904631 0.000038 0.000038
-   Timestamp      Connected: 1762961441.904711 0.000118 0.000080
-   BackendOpen    24 backend_somecust 127.0.0.1 10620 127.0.0.1 28462 connect
-   Timestamp      Bereq: 1762961441.904751 0.000158 0.000040
-   BerespProtocol HTTP/1.1
-   BerespStatus   304
-   BerespReason   Not Modified
-   BerespHeader   date: Wed, 12 Nov 2025 15:30:41 GMT
-   BerespHeader   set-cookie: INGRESSCOOKIE_static-delivery=...; Path=/; HttpOnly
-   BerespHeader   last-modified: Wed, 22 Jan 2025 13:03:40 GMT
-   BerespHeader   etag: "af7dbd2d-62c4b1f540de4"
-   BerespHeader   accept-ranges: bytes
-   BerespHeader   content-security-policy: default-src 'self' data: blob: 'unsafe-inline' 'unsafe-eval' *.example.de *.example.com *.googleapis.com;
-   BerespHeader   x-frame-options: SAMEORIGIN
-   BerespHeader   x-xss-protection: 1; mode=block
-   BerespHeader   x-content-type-options: nosniff
-   BerespHeader   x-permitted-cross-domain-policies: none
-   Timestamp      Beresp: 1762961441.908601 0.004008 0.003849
-   TTL            RFC 1200 86400 0 1762961442 1762961442 1762961441 0 0 cacheable
-   BerespProtocol HTTP/1.1
-   BerespStatus   200
-   BerespReason   OK
-   BerespHeader   content-type: application/octet-stream
-   BerespHeader   content-length: 2944253229
-   BerespHeader   X-Cacheable: Yes: Forced Testing GH 4256
-   VCL_call       BACKEND_RESPONSE
-   TTL            VCL 60 86400 0 1762961442 cacheable
-   BerespUnset    X-Cacheable: Yes: Forced Testing GH 4256
-   BerespHeader   X-Cacheable: Yes: Forced Testing GH 4256
-   TTL            VCL 60 86400 0 1762961442 cacheable
-   VCL_return     deliver
-   Timestamp      Process: 1762961441.908621 0.004028 0.000019
-   Storage        malloc s0
-   FetchError     Could not get storage
-   BackendClose   24 backend_somecust recycle
-   BereqAcct      530 0 530 659 0 659
-   End            

*   << Request  >> 32770     
-   Begin          req 32769 rxreq
-   Timestamp      Start: 1762961441.904471 0.000000 0.000000
-   Timestamp      Req: 1762961441.904471 0.000000 0.000000
-   VCL_use        boot
-   ReqStart       1.2.3.4 24448 a1
-   ReqMethod      GET
-   ReqURL         /testfile.mp4
-   ReqProtocol    HTTP/1.1
-   ReqHeader      host: example.com
-   ReqHeader      user-agent: Wget/1.21.3
-   ReqHeader      accept: */*
-   ReqHeader      accept-encoding: identity
-   ReqHeader      https: true
-   ReqHeader      x-forwarded-proto: https
-   ReqHeader      example-forwarded-proto: https
-   ReqHeader      x-forwarded-port: 443
-   ReqHeader      x-example-fe_name: somefrontend
-   ReqHeader      x-example-fe_dst: 1.2.3.4
-   ReqHeader      x-forwarded-for: 1.2.3.4
-   ReqHeader      connection: close
-   ReqUnset       x-forwarded-for: 1.2.3.4
-   ReqHeader      X-Forwarded-For: 1.2.3.4, 1.2.3.4
-   ReqHeader      Via: 1.1 somecust_prod (Varnish/7.7)
-   VCL_call       RECV
-   VCL_Log        XID: n095142-32770
-   VCL_Log        DEBUG: X-Forwarded-For: outside or not set: X-Forwarded-For=1.2.3.4
-   ReqUnset       X-Forwarded-For: 1.2.3.4, 1.2.3.4
-   ReqHeader      X-Forwarded-For: 1.2.3.4
-   VCL_Log        DEBUG: vcl_recv(): Backend backend_somecust is healthy
-   VCL_return     hash
-   ReqUnset       accept-encoding: identity
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       MISS
-   VCL_return     fetch
-   Link           bereq 32771 fetch
-   Timestamp      Fetch: 1762961441.908734 0.004262 0.004262
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     date: Wed, 12 Nov 2025 15:30:41 GMT
-   RespHeader     set-cookie: INGRESSCOOKIE_static-delivery=...; Path=/; HttpOnly
-   RespHeader     last-modified: Wed, 22 Jan 2025 13:03:40 GMT
-   RespHeader     etag: "af7dbd2d-62c4b1f540de4"
-   RespHeader     content-security-policy: default-src 'self' data: blob: 'unsafe-inline' 'unsafe-eval' *.example.de *.example.com *.googleapis.com;
-   RespHeader     x-frame-options: SAMEORIGIN
-   RespHeader     x-xss-protection: 1; mode=block
-   RespHeader     x-content-type-options: nosniff
-   RespHeader     x-permitted-cross-domain-policies: none
-   RespHeader     content-type: application/octet-stream
-   RespHeader     content-length: 2944253229
-   RespHeader     X-Cacheable: Yes: Forced Testing GH 4256
-   RespHeader     X-Varnish: 32770
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 somecust_prod (Varnish/7.7)
-   RespHeader     Accept-Ranges: bytes
-   VCL_call       DELIVER
-   RespHeader     X-Cache: MISS
-   VCL_Log        DEBUG: vcl_deliver(): Hits on /testfile.mp4: 0
-   VCL_Log        DEBUG: vcl_deliver(): WARNING: Object is X-Cachable='Yes: Forced Testing GH 4256' but Set-Cookie='INGRESSCOOKIE_static-delivery=...; Path=/; HttpOnly' will be delivered anyway!
-   RespUnset      X-Cache: MISS
-   RespUnset      X-Cacheable: Yes: Forced Testing GH 4256
-   RespUnset      X-Varnish: 32770
-   RespUnset      Via: 1.1 somecust_prod (Varnish/7.7)
-   VCL_return     deliver
-   Timestamp      Process: 1762961441.908759 0.004287 0.000025
-   Filters        
-   RespHeader     Connection: close
-   Timestamp      Resp: 1762961443.336858 1.432386 1.428098
-   ReqAcct        407 0 407 744 1073741824 1073742568
-   End            

*   << Session  >> 32769     
-   Begin          sess 0 PROXY
-   SessOpen       0.0.0.0 0 a1 0.0.0.0 0 1762961441.904436 17
-   Proxy          2 1.2.3.4 24448 1.2.3.4 443
-   Link           req 32770 rxreq
-   SessClose      REM_CLOSE 1.461
-   End

idl0r avatar Nov 12 '25 15:11 idl0r

Ok, I may have something. We have some very long default grace times. Like 24h or so (-pdefault_grace=...). I reduced that to 125s while the object TTL is still at 2m. That works fine. If the grace is longer, it then fails, if the cache is actually full. We have those long grace periods in case of backend downtime / errors, to serve stale content. Seems like it's really kept that long. The old / same object, that cannot be served again and needs to be re-fetched, would be appended to the cache instead of renewing / replacing the old object. Is it like that?

idl0r avatar Nov 12 '25 16:11 idl0r

The problem you are seeing here is related to the way we deal with revalidation of stale objects.

When a stale object is revalidated (varnish receives a 304 from the backend), a new object is created, and the body of the stale object is copied to the new one. At this point in time, both objects need to be present, but since your stevedore does not have the capacity for both, you are seeing this FetchError Could not get storage error.

Here is a test case that reproduces the issue:

varnishtest "object revalidation out of space"

server s0 {
	rxreq
	txresp -hdr "Etag: abcd" -bodylen 838860
	rxreq
	expect req.http.If-None-Match == abcd
	txresp -status 304
} -start

varnish v1 -arg "-smalloc,1M" -vcl+backend {
	sub vcl_backend_response {
		set beresp.ttl = 0.01s;
		set beresp.grace = 0s;
		set beresp.keep = 10s;
	}
} -start

client c1 {
	txreq
	rxresp
	expect resp.bodylen == 838860
	txreq
	rxresp
	expect resp.status == 200
} -run

I don't have any recommendations for now except not using keep/grace for these objects. I'll reopen the issue to let others share their opinions.

walid-git avatar Nov 12 '25 17:11 walid-git

Is it possible to somehow increase/set grace only and if the backend is not healthy? Like not per request but if the health check figures out the backend is down. Just keep anything in cache you have and use that. That'd be really nice.

idl0r avatar Nov 17 '25 14:11 idl0r

Maybe this piece of the documentation is what you are looking for? The idea is to have a long grace and/or TTL, and reduce it when probes indicate that the backend is working.

hermunn avatar Nov 17 '25 14:11 hermunn

Hm, that could indeed work. I'll give it a try for some tests. Thanks!

idl0r avatar Nov 17 '25 14:11 idl0r

So, yeah, that kinda works. It still has the same problem but only if the backend was down. That should reduce the risk to a minimum tho.

idl0r avatar Nov 17 '25 15:11 idl0r

To summarize, as walid has explained, when we re-validate an object (process a 304 response to a conditional GET), we create a new cache object and copy the body data from the existing cache object.

This is actually advantageous if the original body was received with chunked encoding, because for the "304 copy" we now have the exact content length and hence can use memory allocations with a better layout. Also, copying results in more overall "cache turnover", which can help fight fragmentation. In general, the generic copy is simple and reduces storage engine complexity.

The drawbacks of the current implementation are, however, that the copy has a higher cost than some form of reference mechanism would have and, as discussed here, momentarily needs additional cache space.

Already when the backend 304 was introduced years ago, we discussed the option to introduce a body stealing object API. This basically could work as follows:

  • core code calls something like ObjCloneBody(wrk, newoc, oldoc)
  • if newoc and oldoc use different storage engines, or the storage engine does not support it, core code uses something similar to the current code in vbf_stp_condfetch() to copy the body
  • else a new object method objclonebody_f is called

The implementation of objclonebody_f is entirely up to the storage engine, but it needs to ensure that the old object remains valid for as long as references on it exist, in particular for ongoing iterators and VAI leases.

For the built-in storage engines using SML, I think that the clone function might be as simple as setting a "do not free the body segments" flag on the old object and copying the head, but I am not sure if this works.

nigoroll avatar Nov 19 '25 15:11 nigoroll