ingress-nginx icon indicating copy to clipboard operation
ingress-nginx copied to clipboard

`port 80 is already in use. Please check the flag --http-port` on GKE, ingress-nginx version 1.1.2 and 1.1.3

Open praveenperera opened this issue 3 years ago • 21 comments

I was deployed ingress-nginx using helm with: https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.1.2/deploy/static/provider/cloud/deploy.yaml

Today I noticed that all my workloads went down and the ingress-nginx-controller was in a crash loop with the error:

port 80 is already in use. Please check the flag --http-port

I tried updating to: v1.1.3. That did not fix it. But downgrading to v1.1.1 did fix it.

praveenperera avatar Apr 12 '22 02:04 praveenperera

@praveenperera: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 12 '22 02:04 k8s-ci-robot

/remove-kind bug /kind support

What process was/is using that port ?

longwuyuan avatar Apr 12 '22 02:04 longwuyuan

As far as I can tell nothing else was using it. It's weird because I deployed this a few days ago and it was working fine. It randomly stopped working after a restart. Before downgrading to v1.1.1 I tried recycling all the nodes. I also uninstalled ingress-nginx completely using helm and reinstalled (new load balancer and all).

Sorry if this isn't the most detailed report. But it was random and fixed by downgrading. I thought I would report it, incase someone else ran into the same thing. Other than downgrading I didn't change anything else.

praveenperera avatar Apr 12 '22 02:04 praveenperera

Do you know if problem can be reproduced ?

longwuyuan avatar Apr 12 '22 02:04 longwuyuan

Do you know if problem can be reproduced ?

Since I saw the same problem after removing and reinstalling completing using helm, I think it might be able to be reproduced.

I will try and spin up a new test cluster tomorrow and let you know if I run into the same issue.

praveenperera avatar Apr 12 '22 02:04 praveenperera

I will try on a brand new cluster without anything on it today. But I forgot to mention I have two clusters on GCP GKE (dev and staging) both setup the exact same way. Both had the same problem on ingress-nginx 1.1.2 and 1.1.3 and both started working when downgraded to 1.1.1

praveenperera avatar Apr 13 '22 14:04 praveenperera

@praveenperera I have the same problem since this morning. I already had version 1.1.1 installed. With version 1.1.0 it worked again.

egobude avatar Apr 14 '22 06:04 egobude

Thanks @egobude I'll downgrade to v1.1.0 if it happens again. But so far so good on v1.1.1

praveenperera avatar Apr 14 '22 14:04 praveenperera

What version of k8s? The default static deploys are generated for 1.20.

Specific ones are available at deploy/static/provider/cloud/VERSION/

strongjz avatar Apr 15 '22 19:04 strongjz

I have the same issue with Kubernetes v1.22.8 and containerd://1.5.8 The initial installation was fine using Kubespray v2.18.1 After reinstallation the ingress nginx on one node can't start NAME READY STATUS RESTARTS AGE ingress-nginx-controller-8rxh6 1/1 Running 0 24m ingress-nginx-controller-cct24 1/1 Running 0 11m ingress-nginx-controller-kr99w 1/1 Running 0 9m36s ingress-nginx-controller-kvl5f 1/1 Running 0 10m ingress-nginx-controller-svqsd 0/1 CrashLoopBackOff 5 (2m49s ago) 5m41s

The logs from the failed pod is k logs -f ingress-nginx-controller-svqsd

NGINX Ingress controller Release: v1.0.4 Build: 9b78b6c197b48116243922170875af4aa752ee59 Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.19.9


F0424 07:43:51.932493 7 main.go:67] port 80 is already in use. Please check the flag --http-port

... I have also tried to change daemonset ports: - containerPort: 80 hostPort: 8888

it doesnt help Port 80 isnt used anyway on the server netstat -lnpt | grep 80 tcp 0 0 0.0.0.0:8081 0.0.0.0:* LISTEN 3109/nginx: master

hudec avatar Apr 24 '22 07:04 hudec

upgrade v1.0.4 to v1.2.0 solved this issue

hudec avatar Apr 24 '22 09:04 hudec

version 1.2.1 appears to still have this issue, but it does not occur in all environments. we have 2 identical machines with identical docker + k8s + helm setup, but one has this issue and the other one does not. downgraded one that did not work to version 1.0.0 and it works, but the other machine chugs away with 1.2.1.

thinking very hard on how to explain this to the customer.

ywptrg avatar Jun 29 '22 09:06 ywptrg

its relatively easy to discover which process id is using port 80 with lost or netstat etc. That is outside the scope of this project.

It will help to see data that shows the installing the ingress-nginx-controller first spawns a process that occupies port 80 and then the installation of ingress-nginx-controller also spawns a second process that also wants to use bind to the same port 80. This is almost not likely otherwise several or all users would report this. If I install ingress-nginx-controller on minikube or kind, I can not reproduce the problem of port 80 being occupied.

So kindly find the process that has occupied port 80 and kill that process. make sure port 80 is not occupied. and only then install the ingress-nginx-controller. Thanks

longwuyuan avatar Jun 29 '22 09:06 longwuyuan

I have the same issue with Kubernetes v1.22.8 and containerd://1.5.8 The initial installation was fine using Kubespray v2.18.1 After reinstallation the ingress nginx on one node can't start NAME READY STATUS RESTARTS AGE ingress-nginx-controller-8rxh6 1/1 Running 0 24m ingress-nginx-controller-cct24 1/1 Running 0 11m ingress-nginx-controller-kr99w 1/1 Running 0 9m36s ingress-nginx-controller-kvl5f 1/1 Running 0 10m ingress-nginx-controller-svqsd 0/1 CrashLoopBackOff 5 (2m49s ago) 5m41s

The logs from the failed pod is

k logs -f ingress-nginx-controller-svqsd NGINX Ingress controller Release: v1.0.4 Build: 9b78b6c Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.19.9

F0424 07:43:51.932493 7 main.go:67] port 80 is already in use. Please check the flag --http-port

... I have also tried to change daemonset ports: - containerPort: 80 hostPort: 8888

it doesnt help Port 80 isnt used anyway on the server netstat -lnpt | grep 80 tcp 0 0 0.0.0.0:8081 0.0.0.0:* LISTEN 3109/nginx: master

I am getting same issue as well for ingress version "1.2.0", suddenly pods are failing with error: "port 80 is already in use. Please check the flag --http-port"... it was working fine with version "1.0.0"

ayush-jain1 avatar Jul 21 '22 07:07 ayush-jain1

There have been several successful installations, even after the release of v1.3.0 of the controller, so this does not look like a problem on the controller.

Second time mention seems relevant that a port being used can be detected but tools like lsof, netstat etc to know which process is owning the currently occupied port 80.

If someone can post a procedure that is a step-by-step instruction for someone else to copy/paste in a minikube cluster or a kind cluster, then some analysis from that data like logs and configs is possible.

Otherwise this could be a environment specific problem where processes don't die when when a process is expected to die. Better to discuss this in the slack channel ingress-nginx-users as there are more people there .

longwuyuan avatar Jul 21 '22 07:07 longwuyuan

I have the same problem, below is the relevant information Version information: Kubernetes: v1.21.14 Ingress-nginx: 1.2.0 nginx version: nginx/1.19.10 Docker version: 19.03.14 OS: CentOS Linux release 7.9.2009 Linux: 3.10.0-1160.el7.x86_64

use lsof -i:80 or netstat -lanp | grep 80 ,no related process found.

pods restarting continuously, ingress-nginx-controller events show below event

MountVolume. SetUp failed for volume "webhook-cert" : secret "ingress-nginx-admission" not found Back-off restarting failed container

but i execute command 'kubectl get secret -n ingress-nginx | grep admission', ingress-nginx-admission is exists.

ingress-nginx-admission Opaque 3 6m5s ingress-nginx-admission-token-jxx6w kubernetes.io/service-account-token 3 6m7s

containers logs container below message F0816 02:25:56.138279 6 main.go:67] port 80 is already in use. Please check the flag --http-port

can anyone help.ths

BearDare avatar Aug 16 '22 02:08 BearDare

Can you show kubectl get po,svc -A

Thanks, ; Long

On Tue, 16 Aug, 2022, 8:22 AM BearDare, @.***> wrote:

I have the same problem, below is the relevant information Version information: Kubernetes: v1.21.14 Ingress-nginx: 1.2.0 nginx version: nginx/1.19.10 Docker version: 19.03.14 OS: CentOS Linux release 7.9.2009 Linux: 3.10.0-1160.el7.x86_64

use lsof -i:80 or netstat -lanp | grep 80 ,no related process found.

pods restarting continuously, ingress-nginx-controller events show below event

MountVolume. SetUp failed for volume "webhook-cert" : secret "ingress-nginx-admission" not found Back-off restarting failed container

but i execute command 'kubectl get secret -n ingress-nginx | grep admission', ingress-nginx-admission is exists.

ingress-nginx-admission Opaque 3 6m5s ingress-nginx-admission-token-jxx6w kubernetes.io/service-account-token 3 6m7s

containers logs container below message F0816 02:25:56.138279 6 main.go:67] port 80 is already in use. Please check the flag --http-port

can anyone help.ths

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/ingress-nginx/issues/8461#issuecomment-1216084154, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGZVWSM4WPBLMCDHX3JJV3VZL66ZANCNFSM5TFADQCA . You are receiving this because you commented.Message ID: @.***>

longwuyuan avatar Aug 16 '22 03:08 longwuyuan

Can you show kubectl get po,svc -A Thanks, ; Long On Tue, 16 Aug, 2022, 8:22 AM BearDare, @.> wrote: I have the same problem, below is the relevant information Version information: Kubernetes: v1.21.14 Ingress-nginx: 1.2.0 nginx version: nginx/1.19.10 Docker version: 19.03.14 OS: CentOS Linux release 7.9.2009 Linux: 3.10.0-1160.el7.x86_64 use lsof -i:80 or netstat -lanp | grep 80 ,no related process found. pods restarting continuously, ingress-nginx-controller events show below event MountVolume. SetUp failed for volume "webhook-cert" : secret "ingress-nginx-admission" not found Back-off restarting failed container but i execute command 'kubectl get secret -n ingress-nginx | grep admission', ingress-nginx-admission is exists. ingress-nginx-admission Opaque 3 6m5s ingress-nginx-admission-token-jxx6w kubernetes.io/service-account-token 3 6m7s containers logs container below message F0816 02:25:56.138279 6 main.go:67] port 80 is already in use. Please check the flag --http-port can anyone help.ths — Reply to this email directly, view it on GitHub <#8461 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGZVWSM4WPBLMCDHX3JJV3VZL66ZANCNFSM5TFADQCA . You are receiving this because you commented.Message ID: @.>

ok, `:~$ kubectl get po,svc -A NAMESPACE NAME READY STATUS RESTARTS AGE calico-apiserver pod/calico-apiserver-6997db6c66-lqhrw 1/1 Running 0 3m7s calico-apiserver pod/calico-apiserver-6997db6c66-vd98d 1/1 Running 0 3m7s calico-system pod/calico-kube-controllers-79f7986874-4h4jd 1/1 Running 0 3m38s calico-system pod/calico-node-fzp5d 1/1 Running 0 3m38s calico-system pod/calico-typha-79f775bdc4-xgk54 1/1 Running 0 3m38s prod pod/nfs-client-provisioner-7d9d74b787-q5h8c 1/1 Running 0 2m15s ingress-nginx pod/ingress-nginx-admission-create-9g69c 0/1 Completed 0 3m39s ingress-nginx pod/ingress-nginx-admission-patch-sxkp2 0/1 Completed 1 3m39s ingress-nginx pod/ingress-nginx-controller-rgm8v 0/1 Error 5 3m26s kube-system pod/coredns-7656c86b69-lfvrv 1/1 Running 0 3m43s kube-system pod/coredns-7656c86b69-rbhnr 1/1 Running 0 3m43s kube-system pod/etcd-192-168-89-214 1/1 Running 4 3m58s kube-system pod/kube-apiserver-192-168-89-214 1/1 Running 4 3m58s kube-system pod/kube-controller-manager-192-168-89-214 1/1 Running 5 3m58s kube-system pod/kube-proxy-xhqlm 1/1 Running 0 3m43s kube-system pod/kube-scheduler-192-168-89-214 1/1 Running 5 3m59s tigera-operator pod/tigera-operator-7cdb76dd8b-45mcb 1/1 Running 0 3m43s

NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE calico-apiserver service/calico-api ClusterIP 10.91.3.4 443/TCP 3m7s calico-system service/calico-kube-controllers-metrics ClusterIP 10.91.3.174 9094/TCP 3m16s calico-system service/calico-typha ClusterIP 10.91.2.208 5473/TCP 3m38s default service/kubernetes ClusterIP 10.91.0.1 443/TCP 4m1s ingress-nginx service/ingress-nginx-controller LoadBalancer 10.91.2.187 80:32397/TCP,443:32170/TCP,9000:32166/TCP 3m39s ingress-nginx service/ingress-nginx-controller-admission ClusterIP 10.91.3.66 443/TCP 3m39s kube-system service/kube-dns ClusterIP 10.91.0.10 53/UDP,53/TCP,9153/TCP 3m59s`

BearDare avatar Aug 16 '22 04:08 BearDare

  • its a problem specific to your environment because I can install without this problem
  • Neither in the post of the creator of this issue nor in your post, there is any information for anyone to analyse and understand the environment where the problem is happening
  • managing the hosts of your cluster is not in the scope of this project
  • You can install on minikube on laptop and verify that the controller works
  • If there is information posted as asked in the issue template, someone may find it useful to analyse if the logs and other debug info is available

longwuyuan avatar Aug 16 '22 05:08 longwuyuan

I have the same issue; it occurred after I changed containerd's root directory from its default path to /home.

What I do:

mkdir /home/lib/containerd/ -p
systemctl stop containerd.service
cp /etc/containerd/config.toml /etc/containerd/config.toml.bak
## change /etc/containerd/config.toml
## root = "/home/lib/containerd"
rsync -aP /var/lib/containerd/ /home/lib/containerd/
systemctl start containerd.service
systemctl status containerd.service

Maybe something happened because of the permission change?

the error:

2022-09-15T13:09:28.817280692+08:00 stdout F -------------------------------------------------------------------------------
2022-09-15T13:09:28.817309656+08:00 stdout F NGINX Ingress controller
2022-09-15T13:09:28.817312188+08:00 stdout F   Release:       v1.3.0
2022-09-15T13:09:28.817313613+08:00 stdout F   Build:         2b7b74854d90ad9b4b96a5011b9e8b67d20bfb8f
2022-09-15T13:09:28.817315196+08:00 stdout F   Repository:    https://github.com/kubernetes/ingress-nginx
2022-09-15T13:09:28.817316827+08:00 stdout F   nginx version: nginx/1.19.10
2022-09-15T13:09:28.817318018+08:00 stdout F 
2022-09-15T13:09:28.817319509+08:00 stdout F -------------------------------------------------------------------------------
2022-09-15T13:09:28.817320765+08:00 stdout F 
2022-09-15T13:09:28.817678488+08:00 stderr F F0915 05:09:28.817609       7 main.go:67] port 80 is already in use. Please check the flag --http-port

iarchean avatar Sep 15 '22 05:09 iarchean

Hi, if there is a problem with the controller, we would love to get the proof so that proof can be used to debug and fix the problem in the controller.

longwuyuan avatar Sep 15 '22 06:09 longwuyuan

I am having this issue on a server and i think that the error message is misleading. IsPortAvailable checks if it's possible to bind on the port.

if that is false then the error is assumed (and logged) as already in use

In fact there are other reasons that it may not be possible to bind to the port. I'm still digging into why this is happening on one of my nodes, but i did verify that:

  • if i run a bare pod with the same image and default user=101, it fails with this error
  • if i run a bare pod with the same image and user=0 it works (and fails on other things which are expected when run out of context)
  • in the run as root pod if i set /nginx-ingress-controller to 4755 (setuid to www-data it is already owned by) then it fails with the port binding error

So on this system i am really encountering a permission issue with binding to the port but it is being reported as a "port already in use".

jrhunger avatar Oct 07 '22 18:10 jrhunger

Looks like in the v1.2.0 image: k8s.gcr.io/ingress-nginx/controller:v1.2.0@sha256:d8196e3bc1e72547c5dec66d6556c0ff92a23f6d0919b206be170bc90d5f9185 that the /nginx-ingress-controller does not have the cap_net_bind_service capability applied, whereas it is on v1.3.0, v1.4.0, and v1.1.3 do.

Tested by creating pod below approximating how the controller is run by the deployment. Change the name and image to what is listed in the releases page for each of the listed versions. Exec into the pod and run this:

apk add libcap
getcap /nginx-ingress-controller

On v1.1.3, v1.3.0, and v1.4.0 the results look like this:

bash-5.1# getcap /nginx*
/nginx-ingress-controller cap_net_bind_service=ep

On v1.2.0, it looks like this:

bash-5.1# getcap /nginx-ingress-controller
bash-5.1#
apiVersion: v1
kind: Pod
metadata:
  name: nginx-root-v120
  namespace: ciok-test
  labels:
    app: nginx
spec:
  containers:
    - name: nginx
      image: k8s.gcr.io/ingress-nginx/controller:v1.2.0@sha256:d8196e3bc1e72547c5dec66d6556c0ff92a23f6d0919b206be170bc90d5f9185
      resources:
        requests:
          memory: "512Mi"
          cpu: "500m"
        limits:
          memory: "1Gi"
          cpu: "1"
      command: ["sleep"]
      args: ["3600"]
      ports:
      - containerPort: 80
        name: http
        protocol: TCP
      - containerPort: 443
        name: https
        protocol: TCP
      securityContext:
        allowPrivilegeEscalation: true
        capabilities:
          add:
          - NET_BIND_SERVICE
          drop:
          - ALL
        runAsUser: 0
  restartPolicy: Never
  nodeSelector:
    kubernetes.io/hostname: k8s-dev-car4-w1
  tolerations:
  - key: "node.kubernetes.io/unschedulable"
    operator: "Exists"
    effect: NoSchedule

Also confirmed that adding the capability on v1.2.0 makes it work:

bash-5.1# apk add getcap
fetch https://dl-cdn.alpinelinux.org/alpine/v3.14/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.14/community/x86_64/APKINDEX.tar.gz
ERROR: unable to select packages:
  getcap (no such package):
    required by: world[getcap]
bash-5.1# apk add libcap
(1/1) Installing libcap (2.50-r0)
Executing busybox-1.33.1-r7.trigger
OK: 26 MiB in 41 packages
bash-5.1# getcap /nginx-ingress-controller
bash-5.1# chmod 4755 /nginx-ingress-controller
bash-5.1# /nginx-ingress-controller
..
F1007 19:50:10.452548      19 main.go:67] port 80 is already in use. Please check the flag --http-port
goroutine 1 [running]:

bash-5.1# setcap cap_net_bind_service+ep /nginx-ingress-controller
bash-5.1# getcap /nginx-ingress-controller
/nginx-ingress-controller cap_net_bind_service=ep

bash-5.1# /nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.2.0
  Build:         a2514768cd282c41f39ab06bda17efefc4bd233a
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.10

-------------------------------------------------------------------------------

W1007 19:52:31.541811      35 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1007 19:52:31.542003      35 main.go:230] "Creating API client" host="https://172.16.128.1:443"
I1007 19:52:31.551388      35 main.go:274] "Running in Kubernetes cluster" major="1" minor="24" git="v1.24.3" state="clean" commit="aef86a93758dc3cb2c658dd9657ab4ad4afc21cb" platform="linux/amd64"
I1007 19:52:31.640657      35 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
W1007 19:52:31.643727      35 main.go:114] No permissions to list and get Ingress Classes: ingressclasses.networking.k8s.io is forbidden: User "system:serviceaccount:ciok-test:default" cannot list resource "ingressclasses" in API group "networking.k8s.io" at the cluster scope, IngressClass feature will be disabled
F1007 19:52:31.643746      35 main.go:123] Unexpected error obtaining ingress-nginx pod: unable to get POD information (missing POD_NAME or POD_NAMESPACE environment variable
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)

Still not sure why v1.2.0 is working on some nodes and not others, though. I suspect that is a system issue unrelated to ingress-nginx.

jrhunger avatar Oct 07 '22 19:10 jrhunger

Leaving the above comment in case it helps someone else in troubleshooting, but in fact when i run the v1.2.0 image on a working node, i do see the cap_net_bind_service=ep on the /nginx-ingress-controller. So i would say the main issue here for this project is the potentially misleading error.

Furthermore, after removing all ingress-nginx controller pods on the node, running crictl rmi --prune (to clean up unused images), and re-running the pod on the node (image would be pulled fresh) it worked fine.

This led me to believe there must have been some corruption in the cached image layers, removing the capability on the underlying file in /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/... I was able to replicate that by stopping the pod, removing the capability from the underlying file that had it, and launching the pod again (now not working).

## find files named nginx-ingress-controller. 
$ find . | grep nginx-ingress-controller
./io.containerd.snapshotter.v1.overlayfs/snapshots/299742/fs/nginx-ingress-controller
./io.containerd.snapshotter.v1.overlayfs/snapshots/299739/fs/nginx-ingress-controller

## higher numbered one has the capability, ower-numbered one does not, presumably lower layer of image before that was set (note - image could probably be smaller if the file was added and setcap run in one layer)
$ getcap ./io.containerd.snapshotter.v1.overlayfs/snapshots/299739/fs/nginx-ingress-controller
$ getcap ./io.containerd.snapshotter.v1.overlayfs/snapshots/299742/fs/nginx-ingress-controller
./io.containerd.snapshotter.v1.overlayfs/snapshots/299742/fs/nginx-ingress-controller = cap_net_bind_service+ep

## remove it from higher-numbered one:
$ setcap -r ./io.containerd.snapshotter.v1.overlayfs/snapshots/299742/fs/nginx-ingress-controller
$ getcap ./io.containerd.snapshotter.v1.overlayfs/snapshots/299742/fs/nginx-ingress-controller
$

## running a new pod, it encounters the issue

jrhunger avatar Oct 07 '22 19:10 jrhunger

@jrhunger thank you very much for this update. Kindly allow me to recap before I check because there was discussion and I think even a PR around CAP_NET_BND.

Do I understand correctly from you above post that even the suspected v1.2.0 of the controller works and there is no flawed code like missing CAP_NET_BIND in v1.2.0 of the controller ?

longwuyuan avatar Oct 07 '22 22:10 longwuyuan

@longwuyuan correct, the v1.2.0 image has the proper cap_net_bind_service when freshly pulled.

jrhunger avatar Oct 07 '22 23:10 jrhunger

Can someone add this to the troubleshooting documentation? @jrhunger if you don't mind we would greatly appreciate it.

strongjz avatar Oct 07 '22 23:10 strongjz

I have also ran into this issue and the error message is misleading. I plan to fix it to make it more appropriate and accurate.

strongjz avatar Oct 08 '22 13:10 strongjz

@strongjz can you have a look at https://github.com/jrhunger/ingress-nginx/blob/troubleshooting-docs-update-ports/docs/troubleshooting.md#unable-to-listen-on-port-80443 and let me know if that is the kind of thing you are looking for?

jrhunger avatar Oct 13 '22 18:10 jrhunger

@jrhunger that looks great, thank you

strongjz avatar Oct 13 '22 20:10 strongjz