varnish-cache icon indicating copy to clipboard operation
varnish-cache copied to clipboard

Unix socket backend leads to high cpu usage in some cases

Open jogoossens opened this issue 1 year ago • 1 comments

Expected Behavior

No unexplained high cpu load when using unix socket on backend.

Current Behavior

We had some longtime intermittent high cpu usage with varnish software. We spent a lot of hours troubleshooting and searching, finally we changed our backend to port 8080 and the high cpu usage stayed away.

When we (re)start varnish, initially there is no high load at all, but after some time (one hour or a few hours) the vcache process experiences high cpu usage.

Possible Solution

No response

Steps to Reproduce (for bugs)

We now switched back and forth between the following settings, and this change alone makes it "stable" versus "high cpu usage".

High cpu usage

backend default_backend {
    .path = "/var/lib/nginx/default.sock";
    .connect_timeout        = 5s;
    .first_byte_timeout     = 600s;
}

No high cpu usage

backend default_backend {
    .host = "127.0.0.1";
    .port = "8080";
    .connect_timeout        = 5s;
    .first_byte_timeout     = 600s;
}

Context

We tried perf top and see high cpu usage in copying data, but it's not easy to see the source so far.

We have this problem for a long time (at least with varnish 7.0 and all following versions, probably 6.x too but unsure)

Varnish Cache version

varnishd 7.6.0

Operating system

Debian 12

Source of binary packages used (if any)

No response

jogoossens avatar Sep 30 '24 07:09 jogoossens

We'll look at it, but it would be really helpful if you could get some perf data, yes. Or a reproducer. Also could you please report which VMODs you use, if any?

nigoroll avatar Sep 30 '24 13:09 nigoroll

We'll look at it, but it would be really helpful if you could get some perf data, yes. Or a reproducer. Also could you please report which VMODs you use, if any?

ping @jogoossens ?

nigoroll avatar Nov 03 '24 16:11 nigoroll

Hi, somehow we bypassed the problem now (it was there for easily a year and earlier tries never got rid of it... ).

We disabled http2 completely with a startup parameter, not sure if that could be related. Possibly we did other things too , but we tried so many things while testing this.

jogoossens avatar Nov 04 '24 08:11 jogoossens

Without any tangible information this issue is not actionable. If anyone has anything for us to work on, please reopen.

nigoroll avatar Nov 04 '24 09:11 nigoroll

The issue comes back from time to time, any ways we could debug this? :)

jogoossens avatar Nov 27 '25 19:11 jogoossens

I believe this was fixed in https://github.com/varnishcache/varnish-cache/pull/4386 so you would either need to apply this change or upgrade to 8.0.0, the only release with this fix.

dridi avatar Nov 28 '25 08:11 dridi

ok great new, so will it be "backported" in a to be released update for 7.x ? :)

jogoossens avatar Nov 28 '25 08:11 jogoossens

I upgraded to varnish 8 as a test, but I still encounter 100% cpu usage. I assume the reason is something else unrelated to the unix sockets...

So any ways we could debug this? :)

jogoossens avatar Dec 04 '25 23:12 jogoossens

Ok I found it was a classic issue, just too many bans! perf top routed me very quickly into the right direction for anyone interested.

jogoossens avatar Dec 08 '25 19:12 jogoossens