Unix socket backend leads to high cpu usage in some cases
Expected Behavior
No unexplained high cpu load when using unix socket on backend.
Current Behavior
We had some longtime intermittent high cpu usage with varnish software. We spent a lot of hours troubleshooting and searching, finally we changed our backend to port 8080 and the high cpu usage stayed away.
When we (re)start varnish, initially there is no high load at all, but after some time (one hour or a few hours) the vcache process experiences high cpu usage.
Possible Solution
No response
Steps to Reproduce (for bugs)
We now switched back and forth between the following settings, and this change alone makes it "stable" versus "high cpu usage".
High cpu usage
backend default_backend {
.path = "/var/lib/nginx/default.sock";
.connect_timeout = 5s;
.first_byte_timeout = 600s;
}
No high cpu usage
backend default_backend {
.host = "127.0.0.1";
.port = "8080";
.connect_timeout = 5s;
.first_byte_timeout = 600s;
}
Context
We tried perf top and see high cpu usage in copying data, but it's not easy to see the source so far.
We have this problem for a long time (at least with varnish 7.0 and all following versions, probably 6.x too but unsure)
Varnish Cache version
varnishd 7.6.0
Operating system
Debian 12
Source of binary packages used (if any)
No response
We'll look at it, but it would be really helpful if you could get some perf data, yes. Or a reproducer. Also could you please report which VMODs you use, if any?
We'll look at it, but it would be really helpful if you could get some perf data, yes. Or a reproducer. Also could you please report which VMODs you use, if any?
ping @jogoossens ?
Hi, somehow we bypassed the problem now (it was there for easily a year and earlier tries never got rid of it... ).
We disabled http2 completely with a startup parameter, not sure if that could be related. Possibly we did other things too , but we tried so many things while testing this.
Without any tangible information this issue is not actionable. If anyone has anything for us to work on, please reopen.
The issue comes back from time to time, any ways we could debug this? :)
I believe this was fixed in https://github.com/varnishcache/varnish-cache/pull/4386 so you would either need to apply this change or upgrade to 8.0.0, the only release with this fix.
ok great new, so will it be "backported" in a to be released update for 7.x ? :)
I upgraded to varnish 8 as a test, but I still encounter 100% cpu usage. I assume the reason is something else unrelated to the unix sockets...
So any ways we could debug this? :)
Ok I found it was a classic issue, just too many bans! perf top routed me very quickly into the right direction for anyone interested.