emqx-operator icon indicating copy to clipboard operation
emqx-operator copied to clipboard

EMQX-Cluster not working in IPV6 only network

Open axkng opened this issue 3 years ago • 27 comments

Describe the bug After following the getting-started page to setup the emqx-operator I provisioned a emqx-cluster. The pods start and are running, but the status commands return errors:

kubectl exec -n emqx -it emqx-0 -c emqx -- emqx_ctl status
Node '[email protected]' not responding to pings.
/opt/emqx/bin/emqx: line 46: die: command not found
command terminated with exit code 127

To Reproduce Steps to reproduce the behavior:

  1. Deploy the operator to a EKS cluster with Kubernetes 1.22.9
  2. Deploy a simple broker (can be without persistence, I tested that.)
  3. Check the output of the status commands and get errors.

Expected behavior Not to get errors on the status commands after provisioning a simple broker with no config.

Anything else we need to know?:

Environment details::

  • Kubernetes version: 1.22.9
  • Cloud-provider/provisioner: AWS EKS
  • emqx-operator version: 1.2.4
  • Install method: helm, emqx deployed as crd emqx-manifest:
---
apiVersion: apps.emqx.io/v1beta3
kind: EmqxBroker
metadata:
  name: emqx
  labels:
    app: emqx
    environment: dev
spec:
  persistent:
    accessModes:
      - ReadWriteOnce
    storageClassName: ebs-gp3
    resources:
      requests:
        storage: 1Gi
  emqxTemplate:
    image: emqx/emqx:4.4.6

Did I do something wrong here?

axkng avatar Aug 10 '22 07:08 axkng

Hi, @Furragen Could you please show emqx-operator logs and emqx custom resource status? run the following command kubectl get EmqxBroker emqx -o json | jq '.status' kubectl logs -f -l "control-plane=controller-manager" -n emqx-operator-system -c manager --tail=100

Rory-Z avatar Aug 10 '22 07:08 Rory-Z

And the emqx pod logs kubectl logs emqx-0 -c emqx

Rory-Z avatar Aug 10 '22 07:08 Rory-Z

Hi @Rory-Z , thanks for your quick response.

kubectl get -n emqx EmqxBroker emqx -o json | jq '.status'
{
  "conditions": [
    {
      "lastTransitionTime": "2022-08-10T07:04:09Z",
      "lastUpdateTime": "2022-08-10T07:26:23Z",
      "message": "Some nodes are not ready",
      "reason": "ClusterNotReady",
      "status": "False",
      "type": "Running"
    },
    {
      "lastTransitionTime": "2022-08-10T07:03:26Z",
      "lastUpdateTime": "2022-08-10T07:03:26Z",
      "message": "All default plugins initialized",
      "reason": "PluginInitializeSuccessfully",
      "status": "True",
      "type": "PluginInitialized"
    }
  ],
  "emqxNodes": [
    {
      "node": "[email protected]",
      "node_status": "Running",
      "otp_release": "24.1.5/12.1.5",
      "version": "4.4.6"
    }
  ],
  "readyReplicas": 1,
  "replicas": 3
}

Logs of the operator ( kubectl logs -f -l "control-plane=controller-manager" -n emqx -c manager --tail=100)

Logs E0810 07:03:59.570352 1 portforward.go:234] lost connection to pod E0810 07:03:59.838997 1 portforward.go:406] an error occurred forwarding 38417 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:00.358323 1 portforward.go:406] an error occurred forwarding 43423 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:00.358653 1 portforward.go:234] lost connection to pod 1.660115040377564e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "d0295269-0001-4c19-ae4d-2be9e74a7321", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 E0810 07:04:00.653062 1 portforward.go:406] an error occurred forwarding 46363 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:00.656235 1 portforward.go:234] lost connection to pod 1.6601150413087435e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "aed5651b-c774-4496-8e04-41ec215aeb76", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 E0810 07:04:01.933473 1 portforward.go:406] an error occurred forwarding 36141 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:01.933924 1 portforward.go:234] lost connection to pod 1.6601150419643033e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "7207ec58-c48f-4f47-bb51-d156051a2e78", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 E0810 07:04:02.226753 1 portforward.go:406] an error occurred forwarding 36151 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:02.227081 1 portforward.go:234] lost connection to pod E0810 07:04:02.639584 1 portforward.go:406] an error occurred forwarding 39885 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:02.639792 1 portforward.go:234] lost connection to pod 1.6601150426838543e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "4ef8f472-14f7-4da7-849b-5220115b9dbc", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 E0810 07:04:02.948658 1 portforward.go:406] an error occurred forwarding 40737 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:02.948869 1 portforward.go:234] lost connection to pod E0810 07:04:03.425248 1 portforward.go:406] an error occurred forwarding 34645 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:03.425551 1 portforward.go:234] lost connection to pod 1.660115043447316e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "7d86ef9a-e0f3-465b-b1e0-32123f7d2377", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 E0810 07:04:03.772788 1 portforward.go:406] an error occurred forwarding 37549 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:03.773081 1 portforward.go:234] lost connection to pod 1.6601150441139083e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "300bc970-0ebd-4d1a-a101-6b073ec449e0", "error": "failed to update StatefulSet emqx: Operation cannot be fulfilled on statefulsets.apps \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 E0810 07:04:04.424648 1 portforward.go:406] an error occurred forwarding 32813 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:04.425127 1 portforward.go:234] lost connection to pod E0810 07:04:04.909241 1 portforward.go:406] an error occurred forwarding 41449 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:04.909421 1 portforward.go:234] lost connection to pod 1.660115044925919e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "e6941001-14b7-4462-8b90-80e6ab8feac4", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 E0810 07:04:05.216967 1 portforward.go:406] an error occurred forwarding 34665 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:05.725675 1 portforward.go:406] an error occurred forwarding 41193 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:05.726157 1 portforward.go:234] lost connection to pod 1.6601150457562108e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "aaa3c815-7409-4e44-b752-09cff2b0531e", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 E0810 07:04:06.034595 1 portforward.go:406] an error occurred forwarding 35435 -> 8081: error forwarding port 8081 to pod 862a1a59ff6fdc75b1c8a7520a2ed57d2720c341f7015556443b4063771ccdd4, uid : failed to execute portforward in network namespace "/var/run/netns/cni-4d298c69-db9b-1c7e-ebac-314710d61826": failed to connect to localhost:8081 inside namespace "862a1a59ff6fdc75b1c8a7520a2ed57d2720c341f7015556443b4063771ccdd4", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:06.034845 1 portforward.go:234] lost connection to pod E0810 07:04:06.440777 1 portforward.go:406] an error occurred forwarding 33389 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:06.441162 1 portforward.go:234] lost connection to pod E0810 07:04:06.735334 1 portforward.go:406] an error occurred forwarding 35219 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:06.735726 1 portforward.go:234] lost connection to pod E0810 07:04:07.043941 1 portforward.go:406] an error occurred forwarding 34727 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:07.044309 1 portforward.go:234] lost connection to pod E0810 07:04:07.450824 1 portforward.go:406] an error occurred forwarding 34031 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:07.451169 1 portforward.go:234] lost connection to pod E0810 07:04:07.790990 1 portforward.go:406] an error occurred forwarding 43441 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:07.791210 1 portforward.go:234] lost connection to pod E0810 07:04:08.188870 1 portforward.go:406] an error occurred forwarding 32839 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:08.189418 1 portforward.go:234] lost connection to pod 1.6601150482063682e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "bbf92a96-d65e-4792-a130-bc0d0f594557", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 E0810 07:04:08.443142 1 portforward.go:406] an error occurred forwarding 35395 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:08.907719 1 portforward.go:406] an error occurred forwarding 33055 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:08.907842 1 portforward.go:234] lost connection to pod E0810 07:04:09.192174 1 portforward.go:406] an error occurred forwarding 34689 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:09.192605 1 portforward.go:234] lost connection to pod E0810 07:04:09.616450 1 portforward.go:406] an error occurred forwarding 42301 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused 1.6601150500930722e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "371cf80b-1e05-4126-ab31-639b04c5d478", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 1.6601150507833595e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "2c813143-c2f5-4398-b6b7-bf3e92bd4350", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234

Logs of the first node: kubectl -n emqx logs emqx-0 -c emqx

hostname: emqx-0: Host not found
Starting emqx on node [email protected]
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:09.807451+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:11.816562+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]']
2022-08-10T07:04:11.816736+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:17.569069+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']
2022-08-10T07:04:17.569265+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:25.334693+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']
2022-08-10T07:04:25.334864+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:32.698077+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']
2022-08-10T07:04:32.698264+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:38.495689+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']
2022-08-10T07:04:38.495870+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms

The logs just stay the same after that.

axkng avatar Aug 10 '22 07:08 axkng

Is this the first deployment? Have you deployed emqx before and deleted it?

Rory-Z avatar Aug 10 '22 07:08 Rory-Z

This is the first deployment of that broker. But yes, I tried to deploy other ones before.

axkng avatar Aug 10 '22 08:08 axkng

Could you please show logs for emqx-1 and emqx-2 ?

Rory-Z avatar Aug 10 '22 08:08 Rory-Z

Sure thing.

kubectl -n exo-emqx logs emqx-1 -c emqx

hostname: emqx-1: Host not found
Starting emqx on node [email protected]
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:11.429818+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:12.467104+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]']
2022-08-10T07:04:12.467272+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:17.639297+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']
2022-08-10T07:04:17.639472+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:24.940386+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']
2022-08-10T07:04:24.940561+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:30.877727+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']
2022-08-10T07:04:30.877912+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:38.386440+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']

kubectl -n emqx logs emqx-2 -c emqx

hostname: emqx-2: Host not found
Starting emqx on node [email protected]
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:21.079133+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:24.909316+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']
2022-08-10T07:04:24.909510+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:32.263225+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']
2022-08-10T07:04:32.263384+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:37.785043+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']
2022-08-10T07:04:37.785206+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:43.694825+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['[email protected]','[email protected]']

Again, the logs just stay the same.

axkng avatar Aug 10 '22 08:08 axkng

@qzhuyan Have any idea ?

Rory-Z avatar Aug 10 '22 08:08 Rory-Z

After talked to @Rory-Z we think it relates to publishNotReadyAddresses flag in the k8s service.

@Rory-Z will release a fix for it.

@Furragen you could try to manually set publishNotReadyAddresses to true and delete all the pods to verify it or wait for the new release of emqx operator.

qzhuyan avatar Aug 10 '22 09:08 qzhuyan

Hi @qzhuyan , I tested this, but sadly the error stays the same.

axkng avatar Aug 10 '22 10:08 axkng

Hi @Furragen EMQX Operator 1.2.5 is released, please try again, and please let me know is it work

Rory-Z avatar Aug 10 '22 10:08 Rory-Z

Hi @Rory-Z , thank your for the new release, but the error sadly was not fixed.

axkng avatar Aug 10 '22 12:08 axkng

@Furragen Sounds frustrating, the EMQX pod log still the same ?

Rory-Z avatar Aug 11 '22 01:08 Rory-Z

Hi, @Furragen Could you please check pod network ? running following command in EMQX pod

nslookup -type=srv $(headless service name).$(namespace).svc.cluster.local

you should got output like this

emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-0.emqx-headless.default.svc.cluster.local
emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-1.emqx-headless.default.svc.cluster.local
emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-2.emqx-headless.default.svc.cluster.local

and check network ping

nc -zv emqx-2.emqx-headless.default.svc.cluster.local 8081

and like this output is successfully

emqx-2.emqx-headless.default.svc.cluster.local (172.17.0.8:8081) open

Rory-Z avatar Aug 11 '22 01:08 Rory-Z

So the lookup worked fine. My cluster uses IPv6 btw. Could that be a problem?

Network ping did not work.

axkng avatar Aug 11 '22 05:08 axkng

Network ping did not work.

I think that is reason.

Could you please check if pinging another EMQX pod with IP in the EMQX pod works?

Rory-Z avatar Aug 11 '22 06:08 Rory-Z

In statefulSet, pod should have stable network ID: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id, EMQX use this network ID discover each other, if network don't work, EMQX cluster will failed.

Because this is the k8s feature, so maybe need check AWS EKS

Rory-Z avatar Aug 11 '22 06:08 Rory-Z

The direct way via the IP of the pod also did not work. And I think I know why: EMQX only listens on IPv4.

netstat -tulpen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:11883         0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8081            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:4370            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8883            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8083            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8084            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:5369            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:1883            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:18083           0.0.0.0:*               LISTEN      1/emqx

This was from inside the emqx-0 pod. Like I said, the cluster uses IPv6, so this can not work. Is there any way to make EMQX listen to IPv6?

axkng avatar Aug 11 '22 06:08 axkng

@Furragen You can deploy EMQX like this:

apiVersion: apps.emqx.io/v1beta3
kind: EmqxBroker
metadata:
  name: emqx
spec:
  emqxTemplate:
    image: emqx/emqx:4.4.6
    config:
      listener.tcp.external: :::1883
      management.listener.http: :::8081
      dashboard.listener.http: :::18083

Sorry I don't have IPV6 cluster, so need your try this

Rory-Z avatar Aug 11 '22 07:08 Rory-Z

Absolutely no problem. I redeployed the broker and we got a little further. The logs and the error stays the same:

emqx_ctl cluster_status
Node '[email protected]' not responding to pings.
/opt/emqx/bin/emqx: line 46: die: command not found

But: doing the ping by hand with ncnow succeeds. So the connection works, but something is still broken. Could there be more listeners that I need to switch to v6?

axkng avatar Aug 11 '22 07:08 axkng

Cooool, You can change all the listener you care about to IPV6 format, see https://www.emqx.io/docs/en/v4.4/configuration/configuration.html#listener-tcp-external

Could you please run following command in EMQX pod:

emqx eval "net_adm:ping('[email protected]')."

The [email protected] is other EMQX node name

Rory-Z avatar Aug 11 '22 09:08 Rory-Z

So, I tried this and the command you mentioned did not succeed. The error is:

Node '[email protected]' not responding to pings.
/usr/local/bin/emqx: line 46: die: command not found

This error always appears when running the emqx-command.

Also, I have tested around with setting listeners to IPv6:

apiVersion: apps.emqx.io/v1beta3
kind: EmqxBroker
metadata:
  name: emqx
  labels:
    app: emqx
    environment: dev
spec:
  persistent:
    accessModes:
      - ReadWriteOnce
    storageClassName: ebs-gp3
    resources:
      requests:
        storage: 1Gi
  emqxTemplate:
    image: emqx/emqx:4.4.6
    config:
      listener.tcp.external: :::1883
      listener.ssl.external: :::8883
      management.listener.http: :::8081
      dashboard.listener.http: :::18083
      listener.tcp.internal: :::11883
      listener.ws.external: :::8083
      listener.wss.external: :::8084

The pods start, but the dashboard-plugin seems to be unhappy:

2022-08-11T09:45:39.371399+00:00 [alert] [Plugins] Plugin emqx_dashboard load failed with {function_clause,[{emqx_plugins,apply_configs,[{error,transform_datatypes,{errorlist,[{error,{transform_type,"dashboard.listener.http"}},{error,{conversion,{":::18083",integer}}}]}}],[{file,"emqx_plugins.erl"},{line,302}]},{emqx_plugins,load_plugin,2,[{file,"emqx_plugins.erl"},{line,325}]},{lists,foreach,2,[{file,"lists.erl"},{line,1342}]},{emqx_app,start,2,[{file,"emqx_app.erl"},{line,50}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}

Looks like it cannot convert the v6-notation.

On top of that I found three other settings that would need tuning I think. The first one is cluster.proto_dist. The docs mention that I could set it to inet6_tcp to use IPv6. But when I do that, the pods do not start anymore.

And then there are cluster.mcast.iface and rpc.tcp_server_ip. These two settings do not seem to support IPv6 according to the docs. Is that correct?

The listeners I just mentioned and the ones in my manifest seem to be the ones EMQX starts by default, so I did not look further.

Do you know of anyone using EMQX with IPv6?

axkng avatar Aug 11 '22 09:08 axkng

@qzhuyan @zmstone Need help

Rory-Z avatar Aug 11 '22 10:08 Rory-Z

Node '[email protected]' not responding to pings.
/usr/local/bin/emqx: line 46: die: command not found

means the peer node that we are pinging is unreachable.

qzhuyan avatar Aug 11 '22 10:08 qzhuyan

I ran the command from the emqx-0 pod, trying to query emqx-1. Does that not mean emqx-0 has a problem?

axkng avatar Aug 11 '22 11:08 axkng

It's likely that EMQX's distribution and RPC library does not support ipv6 that well. We'll investigate it.

zmstone avatar Aug 11 '22 13:08 zmstone

Good to know, thank you.

axkng avatar Aug 12 '22 04:08 axkng

Sorry for the late update. Since there is a lack of issue or PR link to this one, and it's quite some time ago, I cannot be very sure, but I seems the ipv6 issues are already resolved. Here is a fix for the PRC lib wrt ipv6: https://github.com/emqx/gen_rpc/pull/38 and https://github.com/emqx/emqx/pull/11734

zmstone avatar May 10 '24 07:05 zmstone