varnish-cache
varnish-cache copied to clipboard
Assert in HSH_Lookup with Varnish 7.2.1
Hi,
I'm using varnish with a malloc allocator and a file allocator for static files. Settings are the following:
ExecStart=/usr/sbin/varnishd -a :80 -p workspace_session=1024 -p workspace_thread=4096 -p workspace_client=131072 -p workspace_backend=131072 -p listen_depth=16383 -p http_resp_hdr_len=65536 -p http_resp_size=98304 -p workspace_backend=131072 -p thread_pool_min=200 -p thread_pool_max=5000 -p feature=+http2 -P %t/%N/varnishd.pid -T localhost:6082 -f /etc/varnish/default.vcl -s malloc,12288m -s static=file,/tmp/varnish_storage.bin,100G
Expected Behavior
No crash
Current Behavior
I figured out varnish was complaining in the systemctl log. Active: active (running) since lun. 2022-12-05 20:39:57 CET; 1 day 18h ago
déc. 06 21:25:33 varnishd[2813]: Child (8130) Started
déc. 06 21:25:33 varnishd[2813]: Child (8130) said Child starts
déc. 06 21:25:33 varnishd[2813]: Child (8130) said
So the first warning happened a day after starting the server. Then a few hours later, we can see the assert.
This is happening on a ARM architecture (t4g AWS servers) on Amazon Linux 2 + kernel 5.15.75-48.135 I'm using 4 vmod: directors, header, std & xkey I'm fetching the binary from https://packagecloud.io/install/repositories/varnishcache/varnish72/script.rpm.sh & https://github.com/varnish/varnish-modules/releases/download/0.21.0/varnish-modules-0.21.0.tar.gz
My vcl_hash is not that complicated:
sub vcl_hash {
# Called after vcl_recv to create a hash value for the request. This is used as a key
# to look up the object in Varnish.
# Some generic URL manipulation, useful for all templates that follow
# First remove the Google Analytics added parameters, useless for our backend
if (req.url ~ "(\?|&)(mc_[a-z]+|utm_[a-z]+|eff_[a-z0-9]+|fbclid|gclid|cx|ie|cof|siteurl|ranMID|ranEAID|ranSiteID)=") {
set req.http.hash_url = regsuball(req.url, "(\?|&)(mc_[a-z]+|utm_[a-z]+|eff_[a-z0-9]+|fbclid|gclid|cx|ie|cof|siteurl|ranMID|ranEAID|ranSiteID)=[^&]+", "\1");
set req.http.hash_url = regsub(req.http.hash_url, "&+$", "");
set req.http.hash_url = regsub(req.http.hash_url, "&+", "&");
set req.http.hash_url = regsub(req.http.hash_url, "\?&", "?");
set req.http.hash_url = regsub(req.http.hash_url, "\?$", "");
} else {
set req.http.hash_url = req.url;
}
hash_data(req.http.hash_url);
if (req.http.host) {
hash_data(req.http.host);
} else {
hash_data(server.ip);
}
# keep different cache between https and http to avoid infinite loop
if (req.http.X-Forwarded-Proto) {
hash_data(req.http.X-Forwarded-Proto);
}
if (req.http.X-Requested-With) {
hash_data(req.http.X-Requested-With);
}
return (lookup);
}
Any idea what could be wrong?
Best regards, Jocelyn Fournier
No idea what's going on here but the file storage seems to prove highly unreliable these days on aarch64 servers.
Could you try replacing your file storage with a 100GB malloc storage instead? Why did you add another 12GB malloc storage?
And how is /tmp
mounted on this system?
Unfortunately I "only" have 16GB of RAM on this machine. The idea was to split data in two:
- one storage for all the images (> 1TB) where performances are not critical (there's a CDN in front of varnish), to avoid hitting too much the backend
- one storage for all the html pages, in memory (16GB should be more than enough for those kind of pages)
/tmp in not in tmpfs but just in the standard disk partition
Well I'm now running without the file storage, using -s malloc,12288m -s static=malloc,1024m
However it seems I still have random panic issue:
déc. 07 20:13:03 varnishd[2789]: Child launched OK
déc. 07 20:13:03 varnishd[2790]: Child (2802) said Child starts
déc. 07 20:13:03 varnishd[2790]: Child (2802) said <jemalloc>: Error in munmap(): Invalid argument
déc. 07 20:13:03 systemd[1]: Started Varnish Cache, a high-performance HTTP accelerator.
déc. 07 20:52:10 varnishd[2790]: Child (2802) died signal=6
déc. 07 20:52:10 varnishd[2790]: Child (2802) Panic at: Wed, 07 Dec 2022 19:52:10 GMT
Wrong turn at cache/cache_main.c:323:
Signal 11 (Segmentation fault) received at 0xffff92ffe000 si_code 2...
déc. 07 20:52:10 varnishd[2790]: Child cleanup complete
déc. 07 20:52:10 varnishd[2790]: Child (3704) Started
déc. 07 20:52:10 varnishd[2790]: Child (3704) said Child starts
déc. 07 20:52:10 varnishd[2790]: Child (3704) said <jemalloc>: Error in munmap(): Invalid argument
Extra backtrace info:
Child (2802) Panic at: Wed, 07 Dec 2022 19:52:10 GMT
Wrong turn at cache/cache_main.c:323:
Signal 11 (Segmentation fault) received at 0xffff92ffe000 si_code 2
version = varnish-7.2.1 revision 399fa7ed46d68f4c4f42a8fd2748339750d99a8b, vrt api = 16.0
ident = Linux,5.15.75-48.135.amzn2.aarch64,aarch64,-junix,-smalloc,-smalloc,-sdefault,-hcritbit,epoll
now = 2399.653874 (mono), 1670442730.667755 (real)
Backtrace:
0x449230: /usr/sbin/varnishd() [0x449230]
0x4c41dc: /usr/sbin/varnishd(VAS_Fail+0x58) [0x4c41dc]
0x443d48: /usr/sbin/varnishd() [0x443d48]
0xffffa67df820: linux-vdso.so.1(__kernel_rt_sigreturn+0) [0xffffa67df820]
0xffffa6472d88: /lib64/libc.so.6(+0x85d88) [0xffffa6472d88]
0x47fd3c: /usr/sbin/varnishd() [0x47fd3c]
0x480494: /usr/sbin/varnishd() [0x480494]
0x437150: /usr/sbin/varnishd(VFP_Suck+0xa4) [0x437150]
0x432ca4: /usr/sbin/varnishd() [0x432ca4]
0x435b60: /usr/sbin/varnishd() [0x435b60]
argv = {
[0] = \"/usr/sbin/varnishd\",
[1] = \"-a\",
[2] = \":80\",
[3] = \"-p\",
[4] = \"workspace_session=1024\",
[5] = \"-p\",
[6] = \"workspace_thread=4096\",
[7] = \"-p\",
[8] = \"workspace_client=131072\",
[9] = \"-p\",
[10] = \"workspace_backend=131072\",
[11] = \"-p\",
[12] = \"listen_depth=16383\",
[13] = \"-p\",
[14] = \"http_resp_hdr_len=65536\",
[15] = \"-p\",
[16] = \"http_resp_size=98304\",
[17] = \"-p\",
[18] = \"workspace_backend=131072\",
[19] = \"-p\",
[20] = \"thread_pool_min=200\",
[21] = \"-p\",
[22] = \"thread_pool_max=5000\",
[23] = \"-p\",
[24] = \"feature=+http2\",
[25] = \"-P\",
[26] = \"/run/varnish/varnishd.pid\",
[27] = \"-T\",
[28] = \"localhost:6082\",
[29] = \"-f\",
[30] = \"/etc/varnish/default.vcl\",
[31] = \"-s\",
[32] = \"malloc,12288m\",
[33] = \"-s\",
[34] = \"static=malloc,1024m\",
}
pthread.self = 0xffff974671b0
pthread.name = (cache-worker)
pthread.attr = {
guard = 4096,
stack_bottom = 0xffff97448000,
stack_top = 0xffff97468000,
stack_size = 131072,
}
thr.req = NULL
thr.busyobj = 0xffff16fb0020 {
end = 0xffff16fd0000,
retries = 0,
sess = 0xffff33ffc420 {
fd = 88, vxid = 2689893,
t_open = 1670442707.678030,
t_idle = 1670442730.614282,
ws = 0xffff33ffc460 {
id = \"ses\",
{s, f, r, e} = {0xffff33ffc4b8, +96, (nil), +832},
},
transport = HTTP/1 {
state = HTTP1::Proc
}
client = x 50916 :80,
local.endpoint = :80,
local.socket = a0,
local.ip = x:80,
remote.ip = x:50916,
server.ip = x:80,
client.ip = x:50916,
},
worker = 0xffff974666c8 {
ws = 0xffff97466750 {
id = \"wrk\",
{s, f, r, e} = {0xffff974655d0, +0, (nil), +4088},
},
VCL::method = BACKEND_RESPONSE,
VCL::methods = {BACKEND_FETCH, BACKEND_RESPONSE},
},
vfc = 0xffff16fb4f38 {
failed = 0,
req = 0xffff16fb0648,
resp = 0xffff16fb0ac0,
wrk = 0xffff974666c8,
oc = 0xffff6b24c340,
filters = {
V1F_STRAIGHT = 0xffff16fcd4b0 {
priv1 = 0xffff16fb5448,
priv2 = 117346,
closed = 0
},
},
obj_flags = 0x0,
},
filter_list = \"\",
ws = 0xffff16fb0060 {
id = \"bo\",
{s, f, r, e} = {0xffff16fb4f80, +99984, (nil), +110712},
},
ws_bo = 0xffff16fb5430,
http[bereq] = 0xffff16fb0648 {
ws = 0xffff16fb0060 {
[Already dumped, see above]
},
hdrs {
\"GET\",
\"/2436994-large_default/vintage-multicolor-floral-murano-glass-sommerso-vase-italy-1970s.jpg\",
\"HTTP/1.1\",
\"X-Forwarded-Proto: https\",
\"X-Forwarded-Port: 443\",
\"X-Amzn-Trace-Id: Root=1-6390eeea-3f4ea97408e44db20ed02120\",
\"x-default-isocode: US\",
\"x-default-language: en\",
\"cf-ray: 775fccd8297cacf8-ATL\",
\"cf-visitor: {\"scheme\":\"https\"}\",
\"accept: image/webp,image/png,image/svg+xml,image/*;q=0.8,video/*;q=0.8,*/*;q=0.5\",
\"user-agent: Mozilla/5.0 (iPhone; CPU iPhone OS 15_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/238.1.487893381 Mobile/15E148 Safari/604.1\",
\"accept-language: en-US,en;q=0.9\",
\"referer: https://www.design-market.us/248098-vintage-multicolor-floral-murano-glass-sommerso-vase-italy-1970s.html?redirected=true\",
\"cf-connecting-ip: x\",
\"cf-pseudo-ipv4: x\",
\"cf-ipcountry: US\",
\"cdn-loop: cloudflare\",
\"X-Forwarded-For: x, x, x\",
\"Via: 1.1 ip-x.eu-west-1.compute.internal (Varnish/7.2)\",
\"host: www.design-market.us\",
\"X-ETag: noxetag\",
\"Accept-Encoding: gzip\",
\"hash_url: /2436994-large_default/vintage-multicolor-floral-murano-glass-sommerso-vase-italy-1970s.jpg\",
\"X-Varnish: 1249115\",
},
},
http[beresp] = 0xffff16fb0ac0 {
ws = 0xffff16fb0060 {
[Already dumped, see above]
},
hdrs {
\"HTTP/1.1\",
\"200\",
\"OK\",
\"Server: nginx/1.22.0\",
\"Date: Wed, 07 Dec 2022 19:52:10 GMT\",
\"Content-Type: image/jpeg\",
\"Content-Length: 117346\",
\"Connection: keep-alive\",
\"Strict-Transport-Security: max-age=15552001; includeSubDomains;\",
\"Last-Modified: Wed, 12 Oct 2022 18:45:39 GMT\",
\"Accept-Ranges: bytes\",
\"Expires: Fri, 06 Jan 2023 19:52:10 GMT\",
\"X-Content-Type-Options: nosniff\",
\"X-XSS-Protection: 1; mode=block\",
\"X-Frame-Options: sameorigin\",
\"x-storage: static\",
\"cache-control: public, max-age=2592000\",
\"static-url: /2436994-large_default/vintage-multicolor-floral-murano-glass-sommerso-vase-italy-1970s.jpg\",
\"X-UA-Device: \",
\"X-Device-Type: \",
\"X-Country-Code: \",
\"X-ETag: noxetag\",
\"Accept-Language: en-US,en;q=0.9\",
},
},
objcore[fetch] = 0xffff6b24c340 {
refcnt = 3,
flags = {},
exp_flags = {refd},
boc = 0xffff4058cf40 {
refcnt = 3,
state = stream,
vary = (nil),
stevedore_priv = (nil),
},
exp = {1670442730.667573, 2592000.000000, 120.000000, 0.000000},
objhead = 0xffff1205b060,
stevedore = 0xffffa5bf0160 (malloc static) {
Simple = 0xffff15d61f00,
Obj = 0xffff5d156918 {priv=0xffff5d156910, ptr=0xffff15d61f00, len=744, space=744},
LEN = 0x0...0,
VXID = 0x00130f5b,
FLAGS = 0x08,
GZIPBITS = 0x0...0,
LASTMODIFIED = 0x41d8d1c2d4c00000,
VARY = {len=0, ptr=(nil)},
HEADERS = {len=624, ptr=0xffff15d61f78},
Body = 0xffff653920e8 {priv=0xffff653920e0, ptr=0xffff92ff0000, len=0, space=58673},
},
},
http_conn = 0xffff16fb5448 {
fd = 120 (@0xffff46837944),
doclose = null(Not Closing)
ws = 0xffff16fb0060 {
[Already dumped, see above]
},
{rxbuf_b, rxbuf_e} = {0xffff16fb54b0, 0xffff16fb5807},
{pipeline_b, pipeline_e} = {0xffff16fb5807, 0xffff16fcd4b0},
content_length = 117346,
body_status = length,
first_byte_timeout = 300.000000,
between_bytes_timeout = 300.000000,
},
flags = {do_stream},
director_req = 0xffffa5a40af0 {
cli_name = boot.front,
admin_health = probe, changed = 1670440383.896565,
type = round-robin {
},
},
director_resp = 0xffffa5a40ac0 {
cli_name = boot.nlb_ip1,
admin_health = healthy, changed = 1670440383.895000,
type = backend {
conn_pool = 0xffffa5af0600 {
ident = 0xfd480fbdd018b14c3223a30af2cadc2e300897dfdf1263c3d76a6b7fd3d2069c,
vrt_endpoint = 0xffffa5a62ea0 {
ipv4 = 127.0.0.1, port = 8080,
},
},
hosthdr = localhost,
n_conn = 21,
},
},
vcl[vcl] = 0xffffa5af0380 {
name = \"boot\",
busy = 103,
discard = 0,
state = auto,
temp = warm,
conf = 0xffff95b7e650 {
syntax = \"40\",
srcname = {
[0] = \"/etc/varnish/default.vcl\",
[1] = \"/etc/varnish/backend.vcl\",
[2] = \"/etc/varnish/devicedetect.vcl\",
[3] = \"<builtin>\",
},
instances = {
\"front\" = 0xffffa5a72e00,
},
},
},
vpi = 0xffff974655c0 {
handling (VCL::return) = 0x0 (none),
ref = 161,
vpi_ref = 0xffff95b7dc90 {
source = 0 (\"/etc/varnish/default.vcl\"),
offset = 21167,
line = 525,
pos = 3,
src = \"set beresp.http.Accept-Language = bereq.[...]\"
},
},
},
thr.worker = 0xffff974666c8 {
[Already dumped, see above]
},
vmods = {
std = {0xffffa5be01d0, Varnish 7.2.1 399fa7ed46d68f4c4f42a8fd2748339750d99a8b, 0.0},
directors = {0xffffa5be0240, Varnish 7.2.1 399fa7ed46d68f4c4f42a8fd2748339750d99a8b, 0.0},
header = {0xffffa5be02b0, Varnish 7.2.1 399fa7ed46d68f4c4f42a8fd2748339750d99a8b, 16.0},
xkey = {0xffffa5be0320, Varnish 7.2.1 399fa7ed46d68f4c4f42a8fd2748339750d99a8b, 0.0},
},
pools = {
pool = 0xffff9de40100 {
nidle = 173,
nthr = 200,
lqueue = 0
},
pool = 0xffff9de40740 {
nidle = 176,
nthr = 200,
lqueue = 0
},
},
FYI I've just upgraded the jemalloc lib (from 3.6.0 to 5.3.0), I'm monitoring if it has any impact on this issue (it should at least fix the
I would also like to be interested to learn how it fares without jemalloc at all, see #3881.
Thx, I'll also try without jemalloc at all!
FYI Varnish has been running with the original -s malloc,12288m -s static=file,/tmp/varnish_storage.bin,100G and jemalloc 5.3.0 without any issue so far!
Could you please try another build with jemalloc 3.6?
./configure --with-jemalloc --with-unwind --enable-debugging-symbols
And then run varnishd with this jemalloc configuration in your environment:
export MALLOC_CONF="abort:true,redzone:true,junk:true"
Hi @Dridi
Just to be sure, release 7.2.1 should be fine for this test, no need to use develop?
Hi @Dridi
FYI it's not possible to start varnish with jemalloc 3.6 & the debug config on arm because of the munmap issue:
déc. 25 21:42:30 ip-172-29-97-79.eu-west-1.compute.internal varnishd[22282]: Version: varnish-7.2.1 revision NOGIT
déc. 25 21:42:30 ip-172-29-97-79.eu-west-1.compute.internal varnishd[22282]: Platform: Linux,5.10.157-139.675.amzn2.aarch64,aarch64,-junix,-smalloc,-sfile,-sdefault,-hcritbit
déc. 25 21:42:30 ip-172-29-97-79.eu-west-1.compute.internal varnishd[22282]: Child (22295) Started
déc. 25 21:42:30 ip-172-29-97-79.eu-west-1.compute.internal varnishd[22282]: Child (-1) said Child starts
déc. 25 21:42:30 ip-172-29-97-79.eu-west-1.compute.internal varnishd[22282]: Child (-1) said
With jemalloc 5.3.0 it starts properly (but there's other issues after a while with instability / high CPU usage)
Hi @Dridi
I'm now evaluating the 7.2.1 version compiled without jemalloc (+ varnish-modules) on a arm AWS graviton2 processor (compiled with gcc10 & mcpu=neoverse-n1 flags). I'll let you know if it has any instability.
Hi @Dridi
FYI 7.2.1 without jemalloc using patch from #3881 seems to work pretty well on AWS arm. I see a few spike in CPU usage during aggressive google crawls I didn't have before (x86_64 instances), but response time are good & stable so it's not really an issue (here it's an x2gd.large with 2 cores and 32GB of RAM)


I confirm after more than 1 week of use everything is still really stable on ARM without jemalloc.