k3d
k3d copied to clipboard
[BUG] too many open files
What did you do
-
How was the cluster created?
-
k3d registry create own-registry.localhost -p 5000
-
k3d cluster create c1 --kubeconfig-update-default --timeout 300s --agents 1 --k3s-arg --disable=traefik@server:0 --image rancher/k3s:v1.20.11-k3s1 --registry-use own-registry.localhost:5000 --port 80:80@loadbalancer --port 443:443@loadbalancer --verbose
-
-
What did you do afterwards?
- delete the cluster
What did you expect to happen
I expected the cluster to be created successfully. I know this is related to the limits of the OS/docker file descriptors. Increasing the limit with ulimit -n 512
(or any higher number) fixes the issue. However, I am wondering if there is a more elegant way of handling this without the need to manually adjust the ulimit?
The current workaround is to create the cluster without agents (so executing k3d cluster create c1 --kubeconfig-update-default --timeout 300s --k3s-arg --disable=traefik@server:0 --image rancher/k3s:v1.20.11-k3s1 --registry-use own-registry.localhost:5000 --port 80:80@loadbalancer --port 443:443@loadbalancer
). By doing so, no ulimit
changes are required.
Screenshots or terminal output
DEBU[0000] Runtime Info:
&{Name:docker Endpoint:/var/run/docker.sock Version:20.10.8 OSType:linux OS:Docker Desktop Arch:x86_64 CgroupVersion:1 CgroupDriver:cgroupfs Filesystem:extfs}
DEBU[0000] Additional CLI Configuration:
cli:
api-port: ""
env: []
k3s-node-labels: []
k3sargs:
- --disable=traefik@server:0
ports:
- 80:80@loadbalancer
- 443:443@loadbalancer
registries:
create: ""
runtime-labels: []
volumes: []
DEBU[0000] Configuration:
agents: 1
image: rancher/k3s:v1.20.11-k3s1
network: ""
options:
k3d:
disableimagevolume: false
disableloadbalancer: false
disablerollback: false
loadbalancer:
configoverrides: []
timeout: 5m0s
wait: true
kubeconfig:
switchcurrentcontext: true
updatedefaultkubeconfig: true
runtime:
agentsmemory: ""
gpurequest: ""
serversmemory: ""
registries:
config: ""
use:
- own-registry.localhost:5000
servers: 1
subnet: ""
token: ""
DEBU[0000] ========== Simple Config ==========
{TypeMeta:{Kind:Simple APIVersion:k3d.io/v1alpha3} Name: Servers:1 Agents:1 ExposeAPI:{Host: HostIP: HostPort:} Image:rancher/k3s:v1.20.11-k3s1 Network: Subnet: ClusterToken: Volumes:[] Ports:[] Options:{K3dOptions:{Wait:true Timeout:5m0s DisableLoadbalancer:false DisableImageVolume:false NoRollback:false NodeHookActions:[] Loadbalancer:{ConfigOverrides:[]}} K3sOptions:{ExtraArgs:[] NodeLabels:[]} KubeconfigOptions:{UpdateDefaultKubeconfig:true SwitchCurrentContext:true} Runtime:{GPURequest: ServersMemory: AgentsMemory: Labels:[]}} Env:[] Registries:{Use:[own-registry.localhost:5000] Create:<nil> Config:}}
==========================
DEBU[0000] ========== Merged Simple Config ==========
{TypeMeta:{Kind:Simple APIVersion:k3d.io/v1alpha3} Name: Servers:1 Agents:1 ExposeAPI:{Host: HostIP: HostPort:53947} Image:rancher/k3s:v1.20.11-k3s1 Network: Subnet: ClusterToken: Volumes:[] Ports:[{Port:443:443 NodeFilters:[loadbalancer]} {Port:80:80 NodeFilters:[loadbalancer]}] Options:{K3dOptions:{Wait:true Timeout:5m0s DisableLoadbalancer:false DisableImageVolume:false NoRollback:false NodeHookActions:[] Loadbalancer:{ConfigOverrides:[]}} K3sOptions:{ExtraArgs:[{Arg:--disable=traefik NodeFilters:[server:0]}] NodeLabels:[]} KubeconfigOptions:{UpdateDefaultKubeconfig:true SwitchCurrentContext:true} Runtime:{GPURequest: ServersMemory: AgentsMemory: Labels:[]}} Env:[] Registries:{Use:[own-registry.localhost:5000] Create:<nil> Config:}}
==========================
INFO[0000] portmapping '443:443' targets the loadbalancer: defaulting to [servers:*:proxy agents:*:proxy]
INFO[0000] portmapping '80:80' targets the loadbalancer: defaulting to [servers:*:proxy agents:*:proxy]
DEBU[0000] generated loadbalancer config:
ports:
80.tcp:
- k3d-c1-server-0
- k3d-c1-agent-0
443.tcp:
- k3d-c1-server-0
- k3d-c1-agent-0
6443.tcp:
- k3d-c1-server-0
settings:
workerConnections: 1024
DEBU[0000] ===== Merged Cluster Config =====
&{TypeMeta:{Kind: APIVersion:} Cluster:{Name:c1 Network:{Name:k3d-c1 ID: External:false IPAM:{IPPrefix:zero IPPrefix IPsUsed:[] Managed:false} Members:[]} Token: Nodes:[0xc00019c600 0xc00019d500 0xc00019d680] InitNode:<nil> ExternalDatastore:<nil> KubeAPI:0xc00033eec0 ServerLoadBalancer:0xc000319630 ImageVolume:} ClusterCreateOpts:{DisableImageVolume:false WaitForServer:true Timeout:5m0s DisableLoadBalancer:false GPURequest: ServersMemory: AgentsMemory: NodeHooks:[] GlobalLabels:map[app:k3d] GlobalEnv:[] Registries:{Create:<nil> Use:[0xc000379980] Config:<nil>}} KubeconfigOpts:{UpdateDefaultKubeconfig:true SwitchCurrentContext:true}}
===== ===== =====
DEBU[0000] ===== Processed Cluster Config =====
&{TypeMeta:{Kind: APIVersion:} Cluster:{Name:c1 Network:{Name:k3d-c1 ID: External:false IPAM:{IPPrefix:zero IPPrefix IPsUsed:[] Managed:false} Members:[]} Token: Nodes:[0xc00019c600 0xc00019d500 0xc00019d680] InitNode:<nil> ExternalDatastore:<nil> KubeAPI:0xc00033eec0 ServerLoadBalancer:0xc000319630 ImageVolume:} ClusterCreateOpts:{DisableImageVolume:false WaitForServer:true Timeout:5m0s DisableLoadBalancer:false GPURequest: ServersMemory: AgentsMemory: NodeHooks:[] GlobalLabels:map[app:k3d] GlobalEnv:[] Registries:{Create:<nil> Use:[0xc000379980] Config:<nil>}} KubeconfigOpts:{UpdateDefaultKubeconfig:true SwitchCurrentContext:true}}
===== ===== =====
DEBU[0000] '--kubeconfig-update-default set: enabling wait-for-server
INFO[0000] Prep: Network
INFO[0000] Created network 'k3d-c1'
INFO[0000] Created volume 'k3d-c1-images'
DEBU[0000] Trying to find registry own-registry.localhost
DEBU[0000] no netlabel present on container /k3d-own-registry.localhost
DEBU[0000] failed to get IP for container /k3d-own-registry.localhost as we couldn't find the cluster network
DEBU[0000] no netlabel present on container /k3d-own-registry.localhost
DEBU[0000] failed to get IP for container /k3d-own-registry.localhost as we couldn't find the cluster network
DEBU[0000] no netlabel present on container /k3d-own-registry.localhost
DEBU[0000] failed to get IP for container /k3d-own-registry.localhost as we couldn't find the cluster network
INFO[0000] Starting new tools node...
DEBU[0000] Created container k3d-c1-tools (ID: 35d9102fe7fe369e35fe52b31a0b21c6bef7f931ec39a6a62182c171a4b40de5)
DEBU[0000] Node k3d-c1-tools Start Time: 2021-10-14 14:40:37.536194 +0200 CEST m=+0.417687421
INFO[0000] Starting Node 'k3d-c1-tools'
DEBU[0000] Truncated 2021-10-14 12:40:38.066550848 +0000 UTC to 2021-10-14 12:40:38 +0000 UTC
INFO[0001] Creating node 'k3d-c1-server-0'
DEBU[0001] DockerHost:
DEBU[0001] Created container k3d-c1-server-0 (ID: 58fc2f76961e17f0f5d6f943ac6436b0602a235384b9dede9cc9991be87d3521)
DEBU[0001] Created node 'k3d-c1-server-0'
INFO[0001] Creating node 'k3d-c1-agent-0'
DEBU[0001] Created container k3d-c1-agent-0 (ID: 4d8369abe37882d549cba2fd88dab49068620460a1812fdaf1edd3a94ed106ad)
DEBU[0001] Created node 'k3d-c1-agent-0'
INFO[0001] Creating LoadBalancer 'k3d-c1-serverlb'
DEBU[0001] Created container k3d-c1-serverlb (ID: bb544230c74c2ec0ab4515f5c981a6ba8df5fd71f4dfd5455e098abd87cbb144)
DEBU[0001] Created loadbalancer 'k3d-c1-serverlb'
INFO[0001] Using the k3d-tools node to gather environment information
DEBU[0001] no netlabel present on container /k3d-c1-tools
DEBU[0001] failed to get IP for container /k3d-c1-tools as we couldn't find the cluster network
DEBU[0001] no netlabel present on container /k3d-c1-tools
DEBU[0001] failed to get IP for container /k3d-c1-tools as we couldn't find the cluster network
DEBU[0001] Executing command '[sh -c getent ahostsv4 'host.docker.internal']' in node 'k3d-c1-tools'
DEBU[0002] Exec process in node 'k3d-c1-tools' exited with '0'
DEBU[0002] Hostname 'host.docker.internal' -> Address '192.168.65.2'
INFO[0002] Starting cluster 'c1'
INFO[0002] Starting servers...
DEBU[0002] Deleting node k3d-c1-tools ...
DEBU[0002] No fix enabled.
DEBU[0002] Node k3d-c1-server-0 Start Time: 2021-10-14 14:40:39.880629 +0200 CEST m=+2.762233983
INFO[0002] Deleted k3d-c1-tools
INFO[0002] Starting Node 'k3d-c1-server-0'
DEBU[0003] Truncated 2021-10-14 12:40:40.507381917 +0000 UTC to 2021-10-14 12:40:40 +0000 UTC
DEBU[0003] Waiting for node k3d-c1-server-0 to get ready (Log: 'k3s is up and running')
DEBU[0009] Finished waiting for log message 'k3s is up and running' from node 'k3d-c1-server-0'
INFO[0009] Starting agents...
DEBU[0009] No fix enabled.
DEBU[0009] Node k3d-c1-agent-0 Start Time: 2021-10-14 14:40:46.958177 +0200 CEST m=+9.840119863
INFO[0010] Starting Node 'k3d-c1-agent-0'
DEBU[0010] Truncated 2021-10-14 12:40:47.582785575 +0000 UTC to 2021-10-14 12:40:47 +0000 UTC
DEBU[0010] Waiting for node k3d-c1-agent-0 to get ready (Log: 'Successfully registered node')
DEBU[0022] Finished waiting for log message 'Successfully registered node' from node 'k3d-c1-agent-0'
INFO[0022] Starting helpers...
DEBU[0022] Node k3d-c1-serverlb Start Time: 2021-10-14 14:40:59.83467 +0200 CEST m=+22.717227947
INFO[0022] Starting Node 'k3d-c1-serverlb'
DEBU[0023] Truncated 2021-10-14 12:41:00.478078007 +0000 UTC to 2021-10-14 12:41:00 +0000 UTC
DEBU[0023] Waiting for node k3d-c1-serverlb to get ready (Log: 'start worker processes')
DEBU[0029] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
ERRO[0029] Failed Cluster Start: Failed to add one or more helper nodes: Node k3d-c1-serverlb failed to get ready: Failed waiting for log message 'start worker processes' from node 'k3d-c1-serverlb': failed ton inspect container 'bb544230c74c2ec0ab4515f5c981a6ba8df5fd71f4dfd5455e098abd87cbb144': error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/bb544230c74c2ec0ab4515f5c981a6ba8df5fd71f4dfd5455e098abd87cbb144/json": dial unix /var/run/docker.sock: socket: too many open files
ERRO[0029] Failed to create cluster >>> Rolling Back
INFO[0029] Deleting cluster 'c1'
WARNING: Error loading config file: /Users/D073497/.docker/config.json: open /Users/D073497/.docker/config.json: too many open files
DEBU[0029] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
ERRO[0029] Failed to get nodes for cluster 'c1': docker failed to get containers with labels 'map[k3d.cluster:c1]': failed to list containers: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json?all=1&filters=%7B%22label%22%3A%7B%22app%3Dk3d%22%3Atrue%2C%22k3d.cluster%3Dc1%22%3Atrue%7D%7D&limit=0": dial unix /var/run/docker.sock: socket: too many open files
ERRO[0029] failed to get cluster: No nodes found for given cluster
FATA[0029] Cluster creation FAILED, also FAILED to rollback changes!
Which OS & Architecture
MacOS
Which version of k3d
$ k3d version
k3d version v5.0.1
k3s version v1.21.5-k3s2 (default)
Which version of docker
$ docker version
Client:
Cloud integration: 1.0.17
Version: 20.10.8
API version: 1.41
Go version: go1.16.6
Git commit: 3967b7d
Built: Fri Jul 30 19:55:20 2021
OS/Arch: darwin/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.8
API version: 1.41 (minimum version 1.12)
Go version: go1.16.6
Git commit: 75249d8
Built: Fri Jul 30 19:52:10 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.9
GitCommit: e25210fe30a0a703442421b0f60afac609f950a3
runc:
Version: 1.0.1
GitCommit: v1.0.1-0-g4144b63
docker-init:
Version: 0.19.0
GitCommit: de40ad0
$ docker info
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Build with BuildKit (Docker Inc., v0.6.1-docker)
compose: Docker Compose (Docker Inc., v2.0.0-rc.3)
scan: Docker Scan (Docker Inc., v0.8.0)
Server:
Containers: 38
Running: 37
Paused: 0
Stopped: 1
Images: 29
Server Version: 20.10.8
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
Default Runtime: runc
Init Binary: docker-init
containerd version: e25210fe30a0a703442421b0f60afac609f950a3
runc version: v1.0.1-0-g4144b63
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 5.10.47-linuxkit
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 8.746GiB
Name: docker-desktop
ID: OOQ5:YAWJ:WD54:2OKW:K2QS:WU72:R5RR:YBPB:OFUV:NDYI:OYC3:TIUA
Docker Root Dir: /var/lib/docker
Debug Mode: false
HTTP Proxy: http.docker.internal:3128
HTTPS Proxy: http.docker.internal:3128
Registry: <https://index.docker.io/v1/>
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
All of my colleagues running on Mac OS who tried to install the v5 also encountered this error.
I'm using Ubuntu 20.04, but there the installation succeeded, so I'm guessing this is maybe tied to some resource limits imposed in Docker Desktop?
Happens to me too, on Fedora 34:
❯ docker -v
Docker version 20.10.9, build c2ea9bc
❯ uname -r
5.13.12-200.fc34.x86_64
❯ cat /etc/os-release
NAME=Fedora
VERSION="34 (Workstation Edition)"
ID=fedora
VERSION_ID=34
VERSION_CODENAME=""
PLATFORM_ID="platform:f34"
PRETTY_NAME="Fedora 34 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:34"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f34/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=34
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=34
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation
❯ k3d cluster create --agents 5 --servers 3
INFO[0000] Prep: Network
INFO[0000] Created network 'k3d-k3s-default'
INFO[0000] Created volume 'k3d-k3s-default-images'
INFO[0000] Creating initializing server node
INFO[0000] Creating node 'k3d-k3s-default-server-0'
INFO[0000] Starting new tools node...
INFO[0000] Starting Node 'k3d-k3s-default-tools'
INFO[0001] Creating node 'k3d-k3s-default-server-1'
INFO[0002] Creating node 'k3d-k3s-default-server-2'
INFO[0002] Creating node 'k3d-k3s-default-agent-0'
INFO[0002] Creating node 'k3d-k3s-default-agent-1'
INFO[0002] Creating node 'k3d-k3s-default-agent-2'
INFO[0002] Creating node 'k3d-k3s-default-agent-3'
INFO[0002] Creating node 'k3d-k3s-default-agent-4'
INFO[0002] Creating LoadBalancer 'k3d-k3s-default-serverlb'
INFO[0002] Using the k3d-tools node to gather environment information
INFO[0002] HostIP: using network gateway...
INFO[0002] Starting cluster 'k3s-default'
INFO[0002] Starting the initializing server...
INFO[0002] Starting Node 'k3d-k3s-default-server-0'
INFO[0003] Deleted k3d-k3s-default-tools
INFO[0003] Starting servers...
INFO[0003] Starting Node 'k3d-k3s-default-server-1'
INFO[0060] Starting Node 'k3d-k3s-default-server-2'
WARNING: Error loading config file: /home/matul/.docker/config.json: open /home/matul/.docker/config.json: too many open files
ERRO[0126] Failed Cluster Start: Failed to start server k3d-k3s-default-server-2: Node k3d-k3s-default-server-2 failed to get ready: Failed waiting for log message 'k3s is up and running' from node 'k3d-k3s-default-server-2': failed to get container for node 'k3d-k3s-default-server-2': Failed to list containers: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json?all=1&filters=%7B%22label%22%3A%7B%22app%3Dk3d%22%3Atrue%2C%22k3d.cluster.imageVolume%3Dk3d-k3s-default-images%22%3Atrue%2C%22k3d.cluster.network.external%3Dfalse%22%3Atrue%2C%22k3d.cluster.network.id%3Dfb0a2002461808fb629611edd8a31e8832e4d0657b440b2884504b2417904918%22%3Atrue%2C%22k3d.cluster.network.iprange%3D172.19.0.0%2F16%22%3Atrue%2C%22k3d.cluster.network%3Dk3d-k3s-default%22%3Atrue%2C%22k3d.cluster.token%3DEIyWVswHHtjRvqtsbudj%22%3Atrue%2C%22k3d.cluster.url%3Dhttps%3A%2F%2Fk3d-k3s-default-server-0%3A6443%22%3Atrue%2C%22k3d.cluster%3Dk3s-default%22%3Atrue%2C%22k3d.role%3Dserver%22%3Atrue%2C%22k3d.server.api.host%3D0.0.0.0%22%3Atrue%2C%22k3d.server.api.hostIP%3D0.0.0.0%22%3Atrue%2C%22k3d.server.api.port%3D42217%22%3Atrue%2C%22k3d.server.init%3Dfalse%22%3Atrue%2C%22k3d.version%3Dv5.0.1%22%3Atrue%7D%2C%22name%22%3A%7B%22%5E%2F%3F%28k3d-%29%3Fk3d-k3s-default-server-2%24%22%3Atrue%7D%7D&limit=0": dial unix /var/run/docker.sock: socket: too many open files
ERRO[0126] Failed to create cluster >>> Rolling Back
INFO[0126] Deleting cluster 'k3s-default'
WARNING: Error loading config file: /home/matul/.docker/config.json: open /home/matul/.docker/config.json: too many open files
ERRO[0126] Failed to get nodes for cluster 'k3s-default': docker failed to get containers with labels 'map[k3d.cluster:k3s-default]': failed to list containers: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json?all=1&filters=%7B%22label%22%3A%7B%22app%3Dk3d%22%3Atrue%2C%22k3d.cluster%3Dk3s-default%22%3Atrue%7D%7D&limit=0": dial unix /var/run/docker.sock: socket: too many open files
ERRO[0126] failed to get cluster: No nodes found for given cluster
FATA[0126] Cluster creation FAILED, also FAILED to rollback changes!
Hi @dennis-ge , thanks for opening this issue! Unfortunately, there's not much we can do there. I just went over the code again to ensure that k3d properly closes all connections (that required file descriptors) as soon as they're not needed anymore, but there's probably still a lot of them, especially when using multiple nodes (as k3d e.g. needs to follow the logs of every node to get status information). On Linux hosts, you can increase the limits permanently via sysctl.conf (as per https://www.ibm.com/support/pages/increasing-maximum-number-open-files-linux-host) :thinking:
FWIW - I recently found that in order for docker containers to go beyond the default 1024 limit, you must pass in a ulimit argument. Since K3D is founded upon docker, would this apply/help?
For context, I too am running into issues where when deploying too many applications, like a company app and Loki in K3D Loki cannot start due to too many files open.
We even faced this in our GitHub Actions pipelines. It indeed has to be a "fixed" on host level, not k3d level unfortunately.
@lgass can you elaborate please? Increasing the limit with ulimit
was mentioned before already. Do you have additional insights? Do you mean setting something like docker's --ulimit
flag for the k3d node containers? That would be a different issue then.
Yes, even though you set it on the host kernel level, I (outside of K3D) have needed to tell docker to also respect a higher limit than the default (1,024) using the --ulimit parameter. Since I have faced similar issues with applications inside K3D I was just thinking that maybe this parameter could also be useful for initializing K3D containers if that has not already been taken into account.