ingress-nginx
ingress-nginx copied to clipboard
`port 80 is already in use. Please check the flag --http-port` on GKE, ingress-nginx version 1.1.2 and 1.1.3
I was deployed ingress-nginx using helm with: https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.1.2/deploy/static/provider/cloud/deploy.yaml
Today I noticed that all my workloads went down and the ingress-nginx-controller was in a crash loop with the error:
port 80 is already in use. Please check the flag --http-port
I tried updating to: v1.1.3. That did not fix it. But downgrading to v1.1.1 did fix it.
@praveenperera: This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-kind bug /kind support
What process was/is using that port ?
As far as I can tell nothing else was using it. It's weird because I deployed this a few days ago and it was working fine. It randomly stopped working after a restart. Before downgrading to v1.1.1 I tried recycling all the nodes. I also uninstalled ingress-nginx completely using helm and reinstalled (new load balancer and all).
Sorry if this isn't the most detailed report. But it was random and fixed by downgrading. I thought I would report it, incase someone else ran into the same thing. Other than downgrading I didn't change anything else.
Do you know if problem can be reproduced ?
Do you know if problem can be reproduced ?
Since I saw the same problem after removing and reinstalling completing using helm, I think it might be able to be reproduced.
I will try and spin up a new test cluster tomorrow and let you know if I run into the same issue.
I will try on a brand new cluster without anything on it today. But I forgot to mention I have two clusters on GCP GKE (dev and staging) both setup the exact same way. Both had the same problem on ingress-nginx 1.1.2 and 1.1.3 and both started working when downgraded to 1.1.1
@praveenperera I have the same problem since this morning. I already had version 1.1.1 installed. With version 1.1.0 it worked again.
Thanks @egobude I'll downgrade to v1.1.0 if it happens again. But so far so good on v1.1.1
What version of k8s? The default static deploys are generated for 1.20.
Specific ones are available at deploy/static/provider/cloud/VERSION/
I have the same issue with Kubernetes v1.22.8 and containerd://1.5.8 The initial installation was fine using Kubespray v2.18.1 After reinstallation the ingress nginx on one node can't start NAME READY STATUS RESTARTS AGE ingress-nginx-controller-8rxh6 1/1 Running 0 24m ingress-nginx-controller-cct24 1/1 Running 0 11m ingress-nginx-controller-kr99w 1/1 Running 0 9m36s ingress-nginx-controller-kvl5f 1/1 Running 0 10m ingress-nginx-controller-svqsd 0/1 CrashLoopBackOff 5 (2m49s ago) 5m41s
The logs from the failed pod is k logs -f ingress-nginx-controller-svqsd
NGINX Ingress controller Release: v1.0.4 Build: 9b78b6c197b48116243922170875af4aa752ee59 Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.19.9
F0424 07:43:51.932493 7 main.go:67] port 80 is already in use. Please check the flag --http-port
... I have also tried to change daemonset ports: - containerPort: 80 hostPort: 8888
it doesnt help Port 80 isnt used anyway on the server netstat -lnpt | grep 80 tcp 0 0 0.0.0.0:8081 0.0.0.0:* LISTEN 3109/nginx: master
upgrade v1.0.4 to v1.2.0 solved this issue
version 1.2.1 appears to still have this issue, but it does not occur in all environments. we have 2 identical machines with identical docker + k8s + helm setup, but one has this issue and the other one does not. downgraded one that did not work to version 1.0.0 and it works, but the other machine chugs away with 1.2.1.
thinking very hard on how to explain this to the customer.
its relatively easy to discover which process id is using port 80 with lost or netstat etc. That is outside the scope of this project.
It will help to see data that shows the installing the ingress-nginx-controller first spawns a process that occupies port 80 and then the installation of ingress-nginx-controller also spawns a second process that also wants to use bind to the same port 80. This is almost not likely otherwise several or all users would report this. If I install ingress-nginx-controller on minikube or kind, I can not reproduce the problem of port 80 being occupied.
So kindly find the process that has occupied port 80 and kill that process. make sure port 80 is not occupied. and only then install the ingress-nginx-controller. Thanks
I have the same issue with Kubernetes v1.22.8 and containerd://1.5.8 The initial installation was fine using Kubespray v2.18.1 After reinstallation the ingress nginx on one node can't start NAME READY STATUS RESTARTS AGE ingress-nginx-controller-8rxh6 1/1 Running 0 24m ingress-nginx-controller-cct24 1/1 Running 0 11m ingress-nginx-controller-kr99w 1/1 Running 0 9m36s ingress-nginx-controller-kvl5f 1/1 Running 0 10m ingress-nginx-controller-svqsd 0/1 CrashLoopBackOff 5 (2m49s ago) 5m41s
The logs from the failed pod is
k logs -f ingress-nginx-controller-svqsd NGINX Ingress controller Release: v1.0.4 Build: 9b78b6c Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.19.9
F0424 07:43:51.932493 7 main.go:67] port 80 is already in use. Please check the flag --http-port
... I have also tried to change daemonset ports: - containerPort: 80 hostPort: 8888
it doesnt help Port 80 isnt used anyway on the server netstat -lnpt | grep 80 tcp 0 0 0.0.0.0:8081 0.0.0.0:* LISTEN 3109/nginx: master
I am getting same issue as well for ingress version "1.2.0", suddenly pods are failing with error: "port 80 is already in use. Please check the flag --http-port"... it was working fine with version "1.0.0"
There have been several successful installations, even after the release of v1.3.0 of the controller, so this does not look like a problem on the controller.
Second time mention seems relevant that a port being used can be detected but tools like lsof, netstat etc to know which process is owning the currently occupied port 80.
If someone can post a procedure that is a step-by-step instruction for someone else to copy/paste in a minikube cluster or a kind cluster, then some analysis from that data like logs and configs is possible.
Otherwise this could be a environment specific problem where processes don't die when when a process is expected to die. Better to discuss this in the slack channel ingress-nginx-users as there are more people there .
I have the same problem, below is the relevant information Version information: Kubernetes: v1.21.14 Ingress-nginx: 1.2.0 nginx version: nginx/1.19.10 Docker version: 19.03.14 OS: CentOS Linux release 7.9.2009 Linux: 3.10.0-1160.el7.x86_64
use lsof -i:80 or netstat -lanp | grep 80 ,no related process found.
pods restarting continuously, ingress-nginx-controller events show below event
MountVolume. SetUp failed for volume "webhook-cert" : secret "ingress-nginx-admission" not found Back-off restarting failed container
but i execute command 'kubectl get secret -n ingress-nginx | grep admission', ingress-nginx-admission is exists.
ingress-nginx-admission Opaque 3 6m5s ingress-nginx-admission-token-jxx6w kubernetes.io/service-account-token 3 6m7s
containers logs container below message
F0816 02:25:56.138279 6 main.go:67] port 80 is already in use. Please check the flag --http-port
can anyone help.ths
Can you show kubectl get po,svc -A
Thanks, ; Long
On Tue, 16 Aug, 2022, 8:22 AM BearDare, @.***> wrote:
I have the same problem, below is the relevant information Version information: Kubernetes: v1.21.14 Ingress-nginx: 1.2.0 nginx version: nginx/1.19.10 Docker version: 19.03.14 OS: CentOS Linux release 7.9.2009 Linux: 3.10.0-1160.el7.x86_64
use lsof -i:80 or netstat -lanp | grep 80 ,no related process found.
pods restarting continuously, ingress-nginx-controller events show below event
MountVolume. SetUp failed for volume "webhook-cert" : secret "ingress-nginx-admission" not found Back-off restarting failed container
but i execute command 'kubectl get secret -n ingress-nginx | grep admission', ingress-nginx-admission is exists.
ingress-nginx-admission Opaque 3 6m5s ingress-nginx-admission-token-jxx6w kubernetes.io/service-account-token 3 6m7s
containers logs container below message F0816 02:25:56.138279 6 main.go:67] port 80 is already in use. Please check the flag --http-port
can anyone help.ths
— Reply to this email directly, view it on GitHub https://github.com/kubernetes/ingress-nginx/issues/8461#issuecomment-1216084154, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGZVWSM4WPBLMCDHX3JJV3VZL66ZANCNFSM5TFADQCA . You are receiving this because you commented.Message ID: @.***>
Can you show kubectl get po,svc -A Thanks, ; Long … On Tue, 16 Aug, 2022, 8:22 AM BearDare, @.> wrote: I have the same problem, below is the relevant information Version information: Kubernetes: v1.21.14 Ingress-nginx: 1.2.0 nginx version: nginx/1.19.10 Docker version: 19.03.14 OS: CentOS Linux release 7.9.2009 Linux: 3.10.0-1160.el7.x86_64 use lsof -i:80 or netstat -lanp | grep 80 ,no related process found. pods restarting continuously, ingress-nginx-controller events show below event MountVolume. SetUp failed for volume "webhook-cert" : secret "ingress-nginx-admission" not found Back-off restarting failed container but i execute command 'kubectl get secret -n ingress-nginx | grep admission', ingress-nginx-admission is exists. ingress-nginx-admission Opaque 3 6m5s ingress-nginx-admission-token-jxx6w kubernetes.io/service-account-token 3 6m7s containers logs container below message F0816 02:25:56.138279 6 main.go:67] port 80 is already in use. Please check the flag --http-port can anyone help.ths — Reply to this email directly, view it on GitHub <#8461 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGZVWSM4WPBLMCDHX3JJV3VZL66ZANCNFSM5TFADQCA . You are receiving this because you commented.Message ID: @.>
ok, `:~$ kubectl get po,svc -A NAMESPACE NAME READY STATUS RESTARTS AGE calico-apiserver pod/calico-apiserver-6997db6c66-lqhrw 1/1 Running 0 3m7s calico-apiserver pod/calico-apiserver-6997db6c66-vd98d 1/1 Running 0 3m7s calico-system pod/calico-kube-controllers-79f7986874-4h4jd 1/1 Running 0 3m38s calico-system pod/calico-node-fzp5d 1/1 Running 0 3m38s calico-system pod/calico-typha-79f775bdc4-xgk54 1/1 Running 0 3m38s prod pod/nfs-client-provisioner-7d9d74b787-q5h8c 1/1 Running 0 2m15s ingress-nginx pod/ingress-nginx-admission-create-9g69c 0/1 Completed 0 3m39s ingress-nginx pod/ingress-nginx-admission-patch-sxkp2 0/1 Completed 1 3m39s ingress-nginx pod/ingress-nginx-controller-rgm8v 0/1 Error 5 3m26s kube-system pod/coredns-7656c86b69-lfvrv 1/1 Running 0 3m43s kube-system pod/coredns-7656c86b69-rbhnr 1/1 Running 0 3m43s kube-system pod/etcd-192-168-89-214 1/1 Running 4 3m58s kube-system pod/kube-apiserver-192-168-89-214 1/1 Running 4 3m58s kube-system pod/kube-controller-manager-192-168-89-214 1/1 Running 5 3m58s kube-system pod/kube-proxy-xhqlm 1/1 Running 0 3m43s kube-system pod/kube-scheduler-192-168-89-214 1/1 Running 5 3m59s tigera-operator pod/tigera-operator-7cdb76dd8b-45mcb 1/1 Running 0 3m43s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
calico-apiserver service/calico-api ClusterIP 10.91.3.4
- its a problem specific to your environment because I can install without this problem
- Neither in the post of the creator of this issue nor in your post, there is any information for anyone to analyse and understand the environment where the problem is happening
- managing the hosts of your cluster is not in the scope of this project
- You can install on minikube on laptop and verify that the controller works
- If there is information posted as asked in the issue template, someone may find it useful to analyse if the logs and other debug info is available
I have the same issue; it occurred after I changed containerd's root directory from its default path to /home.
What I do:
mkdir /home/lib/containerd/ -p
systemctl stop containerd.service
cp /etc/containerd/config.toml /etc/containerd/config.toml.bak
## change /etc/containerd/config.toml
## root = "/home/lib/containerd"
rsync -aP /var/lib/containerd/ /home/lib/containerd/
systemctl start containerd.service
systemctl status containerd.service
Maybe something happened because of the permission change?
the error:
2022-09-15T13:09:28.817280692+08:00 stdout F -------------------------------------------------------------------------------
2022-09-15T13:09:28.817309656+08:00 stdout F NGINX Ingress controller
2022-09-15T13:09:28.817312188+08:00 stdout F Release: v1.3.0
2022-09-15T13:09:28.817313613+08:00 stdout F Build: 2b7b74854d90ad9b4b96a5011b9e8b67d20bfb8f
2022-09-15T13:09:28.817315196+08:00 stdout F Repository: https://github.com/kubernetes/ingress-nginx
2022-09-15T13:09:28.817316827+08:00 stdout F nginx version: nginx/1.19.10
2022-09-15T13:09:28.817318018+08:00 stdout F
2022-09-15T13:09:28.817319509+08:00 stdout F -------------------------------------------------------------------------------
2022-09-15T13:09:28.817320765+08:00 stdout F
2022-09-15T13:09:28.817678488+08:00 stderr F F0915 05:09:28.817609 7 main.go:67] port 80 is already in use. Please check the flag --http-port
Hi, if there is a problem with the controller, we would love to get the proof so that proof can be used to debug and fix the problem in the controller.
I am having this issue on a server and i think that the error message is misleading. IsPortAvailable checks if it's possible to bind on the port.
if that is false then the error is assumed (and logged) as already in use
In fact there are other reasons that it may not be possible to bind to the port. I'm still digging into why this is happening on one of my nodes, but i did verify that:
- if i run a bare pod with the same image and default user=101, it fails with this error
- if i run a bare pod with the same image and user=0 it works (and fails on other things which are expected when run out of context)
- in the run as root pod if i set /nginx-ingress-controller to 4755 (setuid to www-data it is already owned by) then it fails with the port binding error
So on this system i am really encountering a permission issue with binding to the port but it is being reported as a "port already in use".
Looks like in the v1.2.0 image: k8s.gcr.io/ingress-nginx/controller:v1.2.0@sha256:d8196e3bc1e72547c5dec66d6556c0ff92a23f6d0919b206be170bc90d5f9185 that the /nginx-ingress-controller does not have the cap_net_bind_service capability applied, whereas it is on v1.3.0, v1.4.0, and v1.1.3 do.
Tested by creating pod below approximating how the controller is run by the deployment. Change the name and image to what is listed in the releases page for each of the listed versions. Exec into the pod and run this:
apk add libcap
getcap /nginx-ingress-controller
On v1.1.3, v1.3.0, and v1.4.0 the results look like this:
bash-5.1# getcap /nginx*
/nginx-ingress-controller cap_net_bind_service=ep
On v1.2.0, it looks like this:
bash-5.1# getcap /nginx-ingress-controller
bash-5.1#
apiVersion: v1
kind: Pod
metadata:
name: nginx-root-v120
namespace: ciok-test
labels:
app: nginx
spec:
containers:
- name: nginx
image: k8s.gcr.io/ingress-nginx/controller:v1.2.0@sha256:d8196e3bc1e72547c5dec66d6556c0ff92a23f6d0919b206be170bc90d5f9185
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1"
command: ["sleep"]
args: ["3600"]
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 0
restartPolicy: Never
nodeSelector:
kubernetes.io/hostname: k8s-dev-car4-w1
tolerations:
- key: "node.kubernetes.io/unschedulable"
operator: "Exists"
effect: NoSchedule
Also confirmed that adding the capability on v1.2.0 makes it work:
bash-5.1# apk add getcap
fetch https://dl-cdn.alpinelinux.org/alpine/v3.14/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.14/community/x86_64/APKINDEX.tar.gz
ERROR: unable to select packages:
getcap (no such package):
required by: world[getcap]
bash-5.1# apk add libcap
(1/1) Installing libcap (2.50-r0)
Executing busybox-1.33.1-r7.trigger
OK: 26 MiB in 41 packages
bash-5.1# getcap /nginx-ingress-controller
bash-5.1# chmod 4755 /nginx-ingress-controller
bash-5.1# /nginx-ingress-controller
..
F1007 19:50:10.452548 19 main.go:67] port 80 is already in use. Please check the flag --http-port
goroutine 1 [running]:
bash-5.1# setcap cap_net_bind_service+ep /nginx-ingress-controller
bash-5.1# getcap /nginx-ingress-controller
/nginx-ingress-controller cap_net_bind_service=ep
bash-5.1# /nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v1.2.0
Build: a2514768cd282c41f39ab06bda17efefc4bd233a
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.10
-------------------------------------------------------------------------------
W1007 19:52:31.541811 35 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I1007 19:52:31.542003 35 main.go:230] "Creating API client" host="https://172.16.128.1:443"
I1007 19:52:31.551388 35 main.go:274] "Running in Kubernetes cluster" major="1" minor="24" git="v1.24.3" state="clean" commit="aef86a93758dc3cb2c658dd9657ab4ad4afc21cb" platform="linux/amd64"
I1007 19:52:31.640657 35 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
W1007 19:52:31.643727 35 main.go:114] No permissions to list and get Ingress Classes: ingressclasses.networking.k8s.io is forbidden: User "system:serviceaccount:ciok-test:default" cannot list resource "ingressclasses" in API group "networking.k8s.io" at the cluster scope, IngressClass feature will be disabled
F1007 19:52:31.643746 35 main.go:123] Unexpected error obtaining ingress-nginx pod: unable to get POD information (missing POD_NAME or POD_NAMESPACE environment variable
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)
Still not sure why v1.2.0 is working on some nodes and not others, though. I suspect that is a system issue unrelated to ingress-nginx.
Leaving the above comment in case it helps someone else in troubleshooting, but in fact when i run the v1.2.0 image on a working node, i do see the cap_net_bind_service=ep on the /nginx-ingress-controller. So i would say the main issue here for this project is the potentially misleading error.
Furthermore, after removing all ingress-nginx controller pods on the node, running crictl rmi --prune (to clean up unused images), and re-running the pod on the node (image would be pulled fresh) it worked fine.
This led me to believe there must have been some corruption in the cached image layers, removing the capability on the underlying file in /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/... I was able to replicate that by stopping the pod, removing the capability from the underlying file that had it, and launching the pod again (now not working).
## find files named nginx-ingress-controller.
$ find . | grep nginx-ingress-controller
./io.containerd.snapshotter.v1.overlayfs/snapshots/299742/fs/nginx-ingress-controller
./io.containerd.snapshotter.v1.overlayfs/snapshots/299739/fs/nginx-ingress-controller
## higher numbered one has the capability, ower-numbered one does not, presumably lower layer of image before that was set (note - image could probably be smaller if the file was added and setcap run in one layer)
$ getcap ./io.containerd.snapshotter.v1.overlayfs/snapshots/299739/fs/nginx-ingress-controller
$ getcap ./io.containerd.snapshotter.v1.overlayfs/snapshots/299742/fs/nginx-ingress-controller
./io.containerd.snapshotter.v1.overlayfs/snapshots/299742/fs/nginx-ingress-controller = cap_net_bind_service+ep
## remove it from higher-numbered one:
$ setcap -r ./io.containerd.snapshotter.v1.overlayfs/snapshots/299742/fs/nginx-ingress-controller
$ getcap ./io.containerd.snapshotter.v1.overlayfs/snapshots/299742/fs/nginx-ingress-controller
$
## running a new pod, it encounters the issue
@jrhunger thank you very much for this update. Kindly allow me to recap before I check because there was discussion and I think even a PR around CAP_NET_BND.
Do I understand correctly from you above post that even the suspected v1.2.0 of the controller works and there is no flawed code like missing CAP_NET_BIND in v1.2.0 of the controller ?
@longwuyuan correct, the v1.2.0 image has the proper cap_net_bind_service when freshly pulled.
Can someone add this to the troubleshooting documentation? @jrhunger if you don't mind we would greatly appreciate it.
I have also ran into this issue and the error message is misleading. I plan to fix it to make it more appropriate and accurate.
@strongjz can you have a look at https://github.com/jrhunger/ingress-nginx/blob/troubleshooting-docs-update-ports/docs/troubleshooting.md#unable-to-listen-on-port-80443 and let me know if that is the kind of thing you are looking for?
@jrhunger that looks great, thank you