k3s icon indicating copy to clipboard operation
k3s copied to clipboard

embedded-registry causing goroutine leak-like behaviour in containerd, but only on one server

Open DrJosh9000 opened this issue 10 months ago • 14 comments

Environmental Info:

k3s version v1.32.5+k3s1 (8e8f2a47)
go version go1.23.8

Node(s) CPU architecture, OS, and Version:

Linux pi5a 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
Linux pi5b 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
Linux pi5c 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
Linux blackbox 6.6.92-00003-g075b0ab915ae #22 SMP Sun May 25 11:09:10 UTC 2025 aarch64 GNU/Linux

Cluster Configuration: 3 servers in etcd mode (pi5a, pi5b, pi5c), one non-server agent (blackbox)

Describe the bug: embedded-registry is enabled on pi5a and pi5b with no problems. When embedded-registry is enabled on pi5c, containerd appears to steadily leak goroutines, eventually overwhelming the machine.

Steps To Reproduce:

Installation

On each server:

$ cat | sudo tee -a /etc/hosts
192.168.xx.y0 k3s-a
192.168.xx.y1 k3s-b
192.168.xx.y2 k3s-c
^D
$ curl -o k3s-install.sh -sfL https://get.k3s.io
$ chmod +x k3s-install.sh
pi5a$ ./k3s-install.sh server \
  --cluster-init \
  --tls-san=k3s-a \
  --cluster-cidr=10.42.0.0/16,2404:e80:18b:4200::/56 \
  --service-cidr=10.43.0.0/16,2404:e80:18b:4300::/112

pi5b$ K3S_TOKEN='(REDACTED)' ./k3s-install.sh server \
  --server https://k3s-a:6443 \
  --tls-san=k3s-b \
  --cluster-cidr=10.42.0.0/16,2404:e80:18b:4200::/56 \
  --service-cidr=10.43.0.0/16,2404:e80:18b:4300::/112

pi5c$ K3S_TOKEN='(REDACTED)' ./k3s-install.sh server \
  --server https://k3s-a:6443 \
  --tls-san=k3s-c \
  --cluster-cidr=10.42.0.0/16,2404:e80:18b:4200::/56 \
  --service-cidr=10.43.0.0/16,2404:e80:18b:4300::/112

On all servers:

$ cat | sudo tee /etc/rancher/k3s/config.yaml
write-kubeconfig-mode: "0644"
embedded-registry: true
^D
$ cat | sudo tee /etc/rancher/k3s/registries.yaml
mirrors:
  "*":
^D
$ sudo systemctl restart k3s

Expected behavior: On two of the servers (pi5a and pi5b) the system load is fairly low and stable, whether or not embedded-registry is enabled.

Actual behavior: On one of the servers (pi5c), the CPU usage steadily increased while embedded-registry was enabled:

Image

sudo ctr pprof goroutines on pi5c over time showed an increasing number of goroutines:

Time Lines matching goroutine \d+.*:$
12:15:02 352
12:17:22 748
12:24:17 1813
13:07:48 8536

Additional context / logs:

containerd-130748.goroutines.zip

DrJosh9000 avatar May 28 '25 03:05 DrJosh9000

Start with --enable-pprof and grab a profile, or kill -ABRT the k3s process and look at the stack trace.

Note that if this is indeed containerd and not k3s none of the above will show anything, as containerd is run as a separate process and would not count against K3s process CPU or memory utilization.

brandond avatar May 28 '25 04:05 brandond

I grabbed some CPU and heap profiles and didn't see much that was interesting, but the goroutine profiles are definitely something.

curl --insecure https://k3s-c:6443/debug/pprof/goroutine > k3s.goroutines

As with containerd, the number of goroutines seems to be steadily increasing. I poked at it with go tool pprof for a bit, the top of tree paints a picture:

(pprof) tree
Showing nodes accounting for 20226, 100% of 20231 total
Dropped 585 nodes (cum <= 101)
Showing top 80 nodes out of 110
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                              2859 14.14% |   google.golang.org/grpc.newClientStreamWithParams.func4
                                              2806 13.87% |   github.com/spegel-org/spegel/internal/channel.Merge[go.shape.struct { time.wall uint64; time.ext int64; time.loc *time.Location }].func1
                                              2806 13.87% |   github.com/spegel-org/spegel/internal/channel.Merge[go.shape.struct { time.wall uint64; time.ext int64; time.loc *time.Location }].func2
                                              2806 13.87% |   github.com/spegel-org/spegel/pkg/oci.(*Containerd).Subscribe.func1
                                              2806 13.87% |   github.com/spegel-org/spegel/pkg/oci.(*Containerd).Subscribe.func2
                                              2004  9.91% |   google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader
                                               855  4.23% |   google.golang.org/grpc/internal/transport.(*recvBufferReader).readClient
                                               426  2.11% |   k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
                                               219  1.08% |   google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run
                                               219  1.08% |   internal/poll.runtime_pollWait
                                               143  0.71% |   k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1
                                               127  0.63% |   google.golang.org/grpc/internal/transport.(*controlBuffer).get
                                               122   0.6% |   k8s.io/apiserver/pkg/endpoints/handlers.(*WatchServer).HandleHTTP
                                               120  0.59% |   k8s.io/apimachinery/pkg/util/wait.BackoffUntil
                                                71  0.35% |   io.ReadAtLeast
                                                53  0.26% |   google.golang.org/grpc/internal/transport.(*recvBufferReader).Read

k3s-1745.goroutines.pprof.zip

DrJosh9000 avatar May 28 '25 07:05 DrJosh9000

I'll take a look at the pprof file, but the text you shared just suggests that a lot of CPU time is being spent communicating via (or reconnecting) spegel's grpc channel to containerd, it doesn't say why though.

What made you think that it was specifically a goroutine leak?

brandond avatar May 29 '25 03:05 brandond

To be specific, to me it is goroutine-leak-like behaviour. It may not actually be a proper goroutine leak, but the count of goroutines in containerd and k3s steadily increased over time, without any relation to cluster workload. Aside from the k3s profile I attached above, I took two others - one about half an hour before with 14905 goroutines, and one about 10 minutes before which had 17501 goroutines.

DrJosh9000 avatar May 29 '25 03:05 DrJosh9000

I suspect something is not closing grpc connections when retrying, and many grpc client helpers will start a connection maintenance goroutine in the background that isn't cancelled unless the connection is explicitly closed.

brandond avatar May 29 '25 03:05 brandond

I can't tell from this profile where all the calls to grpc.newClientStreamWithParams are coming from... but the spegel-org/spegel/pkg/oci.(*Containerd).Subscribe calls are all in this goroutine that handles events from containerd:

https://github.com/k3s-io/spegel/blob/v0.2.0-k3s1/pkg/oci/containerd.go#L183-L219

Active filters:
   focus=github\.com/spegel-org/spegel/pkg/oci\.\(\*Containerd\)\.Subscribe\.func1|…
Showing nodes accounting for 5612, 27.74% of 20231 total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
----------------------------------------------------------+-------------
         0     0%     0%       2004  9.91%                | github.com/spegel-org/spegel/pkg/oci.(*Containerd).Subscribe.func1 /go/pkg/mod/github.com/k3s-io/[email protected]/pkg/oci/containerd.go:184
                                              2004   100% |   runtime.chanrecv2 /usr/local/go/src/runtime/chan.go:494
----------------------------------------------------------+-------------
         0     0%     0%        802  3.96%                | github.com/spegel-org/spegel/pkg/oci.(*Containerd).Subscribe.func1 /go/pkg/mod/github.com/k3s-io/[email protected]/pkg/oci/containerd.go:210
                                               802   100% |   runtime.chansend1 /usr/local/go/src/runtime/chan.go:156
----------------------------------------------------------+-------------
         0     0%     0%       2806 13.87%                | github.com/spegel-org/spegel/pkg/oci.(*Containerd).Subscribe.func2 /go/pkg/mod/github.com/k3s-io/[email protected]/pkg/oci/containerd.go:215
                                              2806   100% |   runtime.chanrecv2 /usr/local/go/src/runtime/chan.go:494

This suggests that there are just a TON of image events coming from containerd on this node.

Can you run k3s with --debug or debug: true in the config file, and check the containerd.log to see what's going on? If that doesn't show anything interesting you might need to crank the containerd log level all the way up to trace via the CONTAINERD_LOG_LEVEL env var.

brandond avatar May 30 '25 17:05 brandond

Not sure why I didn't check the logs before, but this stuck out immediately:

May 31 10:53:33 pi5c k3s[1811314]: time="2025-05-31T10:53:33+10:00" level=info msg="spegel 2025/05/31 10:53:33 \"msg\"=\"restarting image state tracker due to error\" \"error\"=\"received errors when updating all images: could not get digests for image git.madhouse-project.org/algernon/iocaine:2.0.0@sha256:d47363d2ce6a9f4ab9eb0e14257e67eb6c74e8090c97151fabf54d99c27e44d0: failed to walk image manifests: invalid character '<' looking for beginning of value\""

I changed registries.yaml to specify all the registries I use except that one, and the problem went away.

DrJosh9000 avatar May 31 '25 01:05 DrJosh9000

Ahh so yeah it was probably stuck in a loop restarting the image state tracker and relisting images over and over again.

You might open an issue with the Spegel project, if you have a good way to reproduce.

brandond avatar May 31 '25 03:05 brandond

@DrJosh9000 please do create an issue in Spegel if you are seeing an error with a specific image.

I have not looked into the specific version being used but Spegel will only return from the state track loop if the context or image channel is closed. All other errors related to state tracking will only log the error.

phillebaba avatar Jun 02 '25 07:06 phillebaba

Thanks, having narrowed it down a bit I do intend to follow up with Spegel (once I get over this flu!)

DrJosh9000 avatar Jun 02 '25 08:06 DrJosh9000

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

github-actions[bot] avatar Jul 17 '25 20:07 github-actions[bot]

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

github-actions[bot] avatar Sep 01 '25 20:09 github-actions[bot]

@brandond is this a placeholder to bump Spegel when fixed, or more needs to be done on our side?

caroline-suse-rancher avatar Sep 10 '25 19:09 caroline-suse-rancher

I haven't managed to reproduce this on demand or narrow it down to anything specific in spegel, but we are planning to bump spegel next month one we get a few things pushed upstream.

brandond avatar Sep 10 '25 21:09 brandond