[BUG] Create/start-start-stop-start result in failure
What did you do
-
How was the cluster created?
k3d cluster create toto
-
What did you do afterwards?
k3d startk3d stopk3d start
What did you expect to happen
There should be no error. Also, I believe that the -tools container should have be deleted after every start.
Screenshots or terminal output
root@ubuntu:~# k3d cluster create toto
INFO[0000] Prep: Network
INFO[0000] Created network 'k3d-toto'
INFO[0000] Created image volume k3d-toto-images
INFO[0000] Starting new tools node...
INFO[0000] Starting Node 'k3d-toto-tools'
INFO[0001] Creating node 'k3d-toto-server-0'
INFO[0001] Creating LoadBalancer 'k3d-toto-serverlb'
INFO[0001] Using the k3d-tools node to gather environment information
INFO[0001] HostIP: using network gateway 172.19.0.1 address
INFO[0001] Starting cluster 'toto'
INFO[0001] Starting servers...
INFO[0001] Starting Node 'k3d-toto-server-0'
INFO[0007] All agents already running.
INFO[0007] Starting helpers...
INFO[0007] Starting Node 'k3d-toto-serverlb'
INFO[0014] Injecting records for hostAliases (incl. host.k3d.internal) and for 2 network members into CoreDNS configmap...
INFO[0016] Cluster 'toto' created successfully!
INFO[0016] You can now use it like this:
kubectl cluster-info
root@ubuntu:~# k3d cluster start toto
INFO[0001] Using the k3d-tools node to gather environment information
INFO[0001] Starting new tools node...
INFO[0002] Starting Node 'k3d-toto-tools'
INFO[0003] HostIP: using network gateway 172.19.0.1 address
INFO[0003] Starting cluster 'toto'
INFO[0003] All servers already running.
INFO[0003] All agents already running.
INFO[0003] All helpers already running.
INFO[0003] Started cluster 'toto'
root@ubuntu:~# docker container ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
031322f4042f rancher/k3d-tools:5.3.0 "/app/k3d-tools noop" 7 seconds ago Up 4 seconds k3d-toto-tools
fdaced82c246 rancher/k3d-proxy:5.3.0 "/bin/sh -c nginx-pr…" 51 seconds ago Up 44 seconds 80/tcp, 0.0.0.0:35215->6443/tcp k3d-toto-serverlb
80218eba68ee rancher/k3s:v1.22.6-k3s1 "/bin/k3s server --t…" 51 seconds ago Up 50 seconds k3d-toto-server-0
root@ubuntu:~# k3d cluster stop toto
INFO[0000] Stopping cluster 'toto'
INFO[0011] Stopped cluster 'toto'
root@ubuntu:~# k3d cluster start toto
INFO[0000] Using the k3d-tools node to gather environment information
INFO[0000] Starting existing tools node k3d-toto-tools...
INFO[0000] Starting Node 'k3d-toto-tools'
INFO[0000] HostIP: using network gateway 172.19.0.1 address
INFO[0000] Starting cluster 'toto'
INFO[0000] Starting servers...
INFO[0000] Starting Node 'k3d-toto-server-0'
INFO[0005] All agents already running.
INFO[0005] Starting helpers...
FATA[0005] Failed to add one or more helper nodes: runtime failed to start node 'k3d-toto-tools': failed to get container for node 'k3d-toto-tools': Didn't find container for node 'k3d-toto-tools'
Which OS & Architecture
- Linux Ubuntu 20.04.4
Which version of k3d
- k3d version v5.3.0
- k3s version v1.22.6-k3s1 (default)
Which version of docker
- docker version 20.10.7
Hi @gourvy , thanks for opening this issue!
Also, I believe that the -tools container should have be deleted after every start.
This is 100% true, but it seems like the main functions returns before the goroutine deleting the tools node is finished, thus the node is still there. The error in the end is caused by the fact, that the goroutine deleting the tools node now has enough time to delete it and it's not present anymore, when helper nodes are being started :exploding_head:
Anyway, I just made sure that the tools node gets deleted properly, as the time required to do so is pretty negligible :+1:
I run into this on k3d v5.4.1:
.venv ❯ k3d cluster start orion
INFO[0000] Using the k3d-tools node to gather environment information
INFO[0000] Starting existing tools node k3d-orion-tools...
INFO[0000] Starting Node 'k3d-orion-tools'
INFO[0001] Starting new tools node...
INFO[0001] Starting Node 'k3d-orion-tools'
INFO[0003] Starting cluster 'orion'
INFO[0003] Starting servers...
INFO[0003] Starting Node 'k3d-orion-server-0'
INFO[0014] All agents already running.
INFO[0014] Starting helpers...
INFO[0014] Starting Node 'k3d-orion-serverlb'
INFO[0014] Starting Node 'orion-registry'
FATA[0014] Failed to add one or more helper nodes: runtime failed to start node 'k3d-orion-tools': failed to get container for node 'k3d-orion-tools': Didn't find container for node 'k3d-orion-tools'
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7684c5ae7f5f ghcr.io/k3d-io/k3d-tools:5.4.1 "/app/k3d-tools noop" 4 minutes ago Up 4 minutes k3d-orion-tools
c0cbba3e9d2d ghcr.io/k3d-io/k3d-proxy:5.4.1 "/bin/sh -c nginx-pr…" 22 hours ago Up 18 minutes 0.0.0.0:10001->80/tcp, 0.0.0.0:58268->6443/tcp k3d-ray-serverlb
4057c2632816 rancher/k3s:v1.23.6-k3s1 "/bin/k3d-entrypoint…" 22 hours ago Up 18 minutes k3d-ray-agent-1
a4c47c99a441 rancher/k3s:v1.23.6-k3s1 "/bin/k3d-entrypoint…" 22 hours ago Up 18 minutes k3d-ray-agent-0
09e050f006c4 rancher/k3s:v1.23.6-k3s1 "/bin/k3d-entrypoint…" 22 hours ago Up 18 minutes k3d-ray-server-0
1bf5eab03f0f 44d68381e3bd "/bin/sh -c nginx-pr…" 13 days ago Up 4 minutes 0.0.0.0:9000-9001->9000-9001/tcp, 0.0.0.0:4200->80/tcp, 0.0.0.0:60359->6443/tcp k3d-orion-serverlb
3082234cca0c rancher/k3s:v1.22.7-k3s1 "/bin/k3d-entrypoint…" 2 weeks ago Up 4 minutes k3d-orion-server-0
43a2cb74066f registry:2 "/entrypoint.sh /etc…" 2 weeks ago Up 4 minutes 0.0.0.0:5550->5000/tcp orion-registry
d6925bd0e10e registry:2 "/entrypoint.sh /etc…" 3 weeks ago Up 18 minutes 0.0.0.0:5555->5000/tcp registry