embedded-registry causing goroutine leak-like behaviour in containerd, but only on one server
Environmental Info:
k3s version v1.32.5+k3s1 (8e8f2a47)
go version go1.23.8
Node(s) CPU architecture, OS, and Version:
Linux pi5a 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
Linux pi5b 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
Linux pi5c 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
Linux blackbox 6.6.92-00003-g075b0ab915ae #22 SMP Sun May 25 11:09:10 UTC 2025 aarch64 GNU/Linux
Cluster Configuration: 3 servers in etcd mode (pi5a, pi5b, pi5c), one non-server agent (blackbox)
Describe the bug: embedded-registry is enabled on pi5a and pi5b with no problems. When embedded-registry is enabled on pi5c, containerd appears to steadily leak goroutines, eventually overwhelming the machine.
Steps To Reproduce:
Installation
On each server:
$ cat | sudo tee -a /etc/hosts
192.168.xx.y0 k3s-a
192.168.xx.y1 k3s-b
192.168.xx.y2 k3s-c
^D
$ curl -o k3s-install.sh -sfL https://get.k3s.io
$ chmod +x k3s-install.sh
pi5a$ ./k3s-install.sh server \
--cluster-init \
--tls-san=k3s-a \
--cluster-cidr=10.42.0.0/16,2404:e80:18b:4200::/56 \
--service-cidr=10.43.0.0/16,2404:e80:18b:4300::/112
pi5b$ K3S_TOKEN='(REDACTED)' ./k3s-install.sh server \
--server https://k3s-a:6443 \
--tls-san=k3s-b \
--cluster-cidr=10.42.0.0/16,2404:e80:18b:4200::/56 \
--service-cidr=10.43.0.0/16,2404:e80:18b:4300::/112
pi5c$ K3S_TOKEN='(REDACTED)' ./k3s-install.sh server \
--server https://k3s-a:6443 \
--tls-san=k3s-c \
--cluster-cidr=10.42.0.0/16,2404:e80:18b:4200::/56 \
--service-cidr=10.43.0.0/16,2404:e80:18b:4300::/112
On all servers:
$ cat | sudo tee /etc/rancher/k3s/config.yaml
write-kubeconfig-mode: "0644"
embedded-registry: true
^D
$ cat | sudo tee /etc/rancher/k3s/registries.yaml
mirrors:
"*":
^D
$ sudo systemctl restart k3s
Expected behavior: On two of the servers (pi5a and pi5b) the system load is fairly low and stable, whether or not embedded-registry is enabled.
Actual behavior: On one of the servers (pi5c), the CPU usage steadily increased while embedded-registry was enabled:
sudo ctr pprof goroutines on pi5c over time showed an increasing number of goroutines:
| Time | Lines matching goroutine \d+.*:$ |
|---|---|
| 12:15:02 | 352 |
| 12:17:22 | 748 |
| 12:24:17 | 1813 |
| 13:07:48 | 8536 |
Additional context / logs:
Start with --enable-pprof and grab a profile, or kill -ABRT the k3s process and look at the stack trace.
Note that if this is indeed containerd and not k3s none of the above will show anything, as containerd is run as a separate process and would not count against K3s process CPU or memory utilization.
I grabbed some CPU and heap profiles and didn't see much that was interesting, but the goroutine profiles are definitely something.
curl --insecure https://k3s-c:6443/debug/pprof/goroutine > k3s.goroutines
As with containerd, the number of goroutines seems to be steadily increasing. I poked at it with go tool pprof for a bit, the top of tree paints a picture:
(pprof) tree
Showing nodes accounting for 20226, 100% of 20231 total
Dropped 585 nodes (cum <= 101)
Showing top 80 nodes out of 110
----------------------------------------------------------+-------------
flat flat% sum% cum cum% calls calls% + context
----------------------------------------------------------+-------------
2859 14.14% | google.golang.org/grpc.newClientStreamWithParams.func4
2806 13.87% | github.com/spegel-org/spegel/internal/channel.Merge[go.shape.struct { time.wall uint64; time.ext int64; time.loc *time.Location }].func1
2806 13.87% | github.com/spegel-org/spegel/internal/channel.Merge[go.shape.struct { time.wall uint64; time.ext int64; time.loc *time.Location }].func2
2806 13.87% | github.com/spegel-org/spegel/pkg/oci.(*Containerd).Subscribe.func1
2806 13.87% | github.com/spegel-org/spegel/pkg/oci.(*Containerd).Subscribe.func2
2004 9.91% | google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader
855 4.23% | google.golang.org/grpc/internal/transport.(*recvBufferReader).readClient
426 2.11% | k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
219 1.08% | google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run
219 1.08% | internal/poll.runtime_pollWait
143 0.71% | k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1
127 0.63% | google.golang.org/grpc/internal/transport.(*controlBuffer).get
122 0.6% | k8s.io/apiserver/pkg/endpoints/handlers.(*WatchServer).HandleHTTP
120 0.59% | k8s.io/apimachinery/pkg/util/wait.BackoffUntil
71 0.35% | io.ReadAtLeast
53 0.26% | google.golang.org/grpc/internal/transport.(*recvBufferReader).Read
I'll take a look at the pprof file, but the text you shared just suggests that a lot of CPU time is being spent communicating via (or reconnecting) spegel's grpc channel to containerd, it doesn't say why though.
What made you think that it was specifically a goroutine leak?
To be specific, to me it is goroutine-leak-like behaviour. It may not actually be a proper goroutine leak, but the count of goroutines in containerd and k3s steadily increased over time, without any relation to cluster workload. Aside from the k3s profile I attached above, I took two others - one about half an hour before with 14905 goroutines, and one about 10 minutes before which had 17501 goroutines.
I suspect something is not closing grpc connections when retrying, and many grpc client helpers will start a connection maintenance goroutine in the background that isn't cancelled unless the connection is explicitly closed.
I can't tell from this profile where all the calls to grpc.newClientStreamWithParams are coming from... but the spegel-org/spegel/pkg/oci.(*Containerd).Subscribe calls are all in this goroutine that handles events from containerd:
https://github.com/k3s-io/spegel/blob/v0.2.0-k3s1/pkg/oci/containerd.go#L183-L219
Active filters:
focus=github\.com/spegel-org/spegel/pkg/oci\.\(\*Containerd\)\.Subscribe\.func1|…
Showing nodes accounting for 5612, 27.74% of 20231 total
----------------------------------------------------------+-------------
flat flat% sum% cum cum% calls calls% + context
----------------------------------------------------------+-------------
0 0% 0% 2004 9.91% | github.com/spegel-org/spegel/pkg/oci.(*Containerd).Subscribe.func1 /go/pkg/mod/github.com/k3s-io/[email protected]/pkg/oci/containerd.go:184
2004 100% | runtime.chanrecv2 /usr/local/go/src/runtime/chan.go:494
----------------------------------------------------------+-------------
0 0% 0% 802 3.96% | github.com/spegel-org/spegel/pkg/oci.(*Containerd).Subscribe.func1 /go/pkg/mod/github.com/k3s-io/[email protected]/pkg/oci/containerd.go:210
802 100% | runtime.chansend1 /usr/local/go/src/runtime/chan.go:156
----------------------------------------------------------+-------------
0 0% 0% 2806 13.87% | github.com/spegel-org/spegel/pkg/oci.(*Containerd).Subscribe.func2 /go/pkg/mod/github.com/k3s-io/[email protected]/pkg/oci/containerd.go:215
2806 100% | runtime.chanrecv2 /usr/local/go/src/runtime/chan.go:494
This suggests that there are just a TON of image events coming from containerd on this node.
Can you run k3s with --debug or debug: true in the config file, and check the containerd.log to see what's going on? If that doesn't show anything interesting you might need to crank the containerd log level all the way up to trace via the CONTAINERD_LOG_LEVEL env var.
Not sure why I didn't check the logs before, but this stuck out immediately:
May 31 10:53:33 pi5c k3s[1811314]: time="2025-05-31T10:53:33+10:00" level=info msg="spegel 2025/05/31 10:53:33 \"msg\"=\"restarting image state tracker due to error\" \"error\"=\"received errors when updating all images: could not get digests for image git.madhouse-project.org/algernon/iocaine:2.0.0@sha256:d47363d2ce6a9f4ab9eb0e14257e67eb6c74e8090c97151fabf54d99c27e44d0: failed to walk image manifests: invalid character '<' looking for beginning of value\""
I changed registries.yaml to specify all the registries I use except that one, and the problem went away.
Ahh so yeah it was probably stuck in a loop restarting the image state tracker and relisting images over and over again.
You might open an issue with the Spegel project, if you have a good way to reproduce.
@DrJosh9000 please do create an issue in Spegel if you are seeing an error with a specific image.
I have not looked into the specific version being used but Spegel will only return from the state track loop if the context or image channel is closed. All other errors related to state tracking will only log the error.
Thanks, having narrowed it down a bit I do intend to follow up with Spegel (once I get over this flu!)
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
@brandond is this a placeholder to bump Spegel when fixed, or more needs to be done on our side?
I haven't managed to reproduce this on demand or narrow it down to anything specific in spegel, but we are planning to bump spegel next month one we get a few things pushed upstream.