hubble-ui icon indicating copy to clipboard operation
hubble-ui copied to clipboard

Hubble UI doesn't work, fresh Cilium 1.12.1 install, "Data stream has failed on the UI backend: EOF"

Open samwho opened this issue 3 years ago • 43 comments

I just today reinstalled Cilium in my bare-metal cluster at home. I installed 1.12.1, I did cilium hubble enable --ui, all went well, I open http://localhost:12000 in my browser and I see this:

image

The page stays like this indefinitely, accumulating more and more GetEvents calls:

image

In the browser console I see the following:

image

Uncertain how to proceed with debugging. Any help would be appreciated.

samwho avatar Sep 11 '22 13:09 samwho

I just attempted upgrading to 1.13.0-rc0 and I experience the same problem.

samwho avatar Sep 11 '22 13:09 samwho

Hi i want to work on this issue. Please assign this issue to me @samwho @gandro @rolinh

DhwanishShah avatar Sep 12 '22 06:09 DhwanishShah

Having the same issue running on minikube. No flows are registering in the UI or via the hubble CLI utility

samuraii78 avatar Sep 12 '22 12:09 samuraii78

Hi i want to work on this issue. Please assign this issue to me @samwho @gandro @rolinh

We're not yet sure what the root cause is. If you know it, please feel free to share or fix. Otherwise I think we need more info.

What response headers do you see in the browser network tab @samwho ?

gandro avatar Sep 13 '22 08:09 gandro

EOF error comes from hubble-relay. We need hubble-ui pod backend container logs and hubble-relay pod logs

geakstr avatar Sep 13 '22 13:09 geakstr

Can confirm same issue is on 1.12.2; K8S 1.25.2, ARM/PI4 architecture

pascal71 avatar Oct 02 '22 18:10 pascal71

relay logs:

level=info msg="Starting gRPC server..." options="{peerTarget:hubble-peer.kube-system.svc.cluster.local:443 dialTimeout:5000000000 retryTimeout:30000000000 listenAddress::4245 metricsListenAddress: log:0x400037c2a0 serverTLSConfig:<nil> insecureServer:true clientTLSConfig:0x40000acbe8 clusterName:default insecureClient:false observerOptions:[0xbfc7d0 0xbfc8d0] grpcMetrics:<nil> grpcUnaryInterceptors:[] grpcStreamInterceptors:[]}" subsys=hubble-relay
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"
level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"

pascal71 avatar Oct 02 '22 18:10 pascal71

Logs of backend container in UI:

level=info msg="running hubble status checker\n" subsys=ui-backend
level=info msg="fetching hubble flows: connecting to hubble-relay (attempt #1)\n" subsys=ui-backend
level=info msg="hubble-relay grpc client created (hubble-relay addr: hubble-relay:80)\n" subsys=ui-backend
level=info msg="hubble status checker: connection to hubble-relay established\n" subsys=ui-backend
level=info msg="hubble-relay grpc client created (hubble-relay addr: hubble-relay:80)\n" subsys=ui-backend
level=info msg="fetching hubble flows: connection to hubble-relay established\n" subsys=ui-backend
level=info msg="fetching hubble flows: connecting to hubble-relay (attempt #1)\n" subsys=ui-backend
level=error msg="flow error: EOF\n" subsys=ui-backend
level=info msg="hubble status checker: stopped\n" subsys="ui-backend:status-checker"
level=info msg="hubble-relay grpc client created (hubble-relay addr: hubble-relay:80)\n" subsys=ui-backend
level=error msg="fetching hubble flows: connecting to hubble-relay (attempt #1) failed: rpc error: code = Canceled desc = context canceled\n" subsys=ui-backend
level=info msg="fetching hubble flows: stream (ui backend <-> hubble-relay) is closed\n" subsys=ui-backend
level=info msg="Get flows request: number:10000  follow:true  blacklist:{source_label:\"reserved:unknown\"  source_label:\"reserved:host\"  source_label:\"k8s:k8s-app=kube-dns\"  source_label:\"reserved:remote-node\"  source_label:\"k8s:app=prometheus\"  source_label:\"reserved:kube-apiserver\"}  blacklist:{destination_label:\"reserved:unknown\"  destination_label:\"reserved:host\"  destination_label:\"reserved:remote-node\"  destination_label:\"k8s:app=prometheus\"  destination_label:\"reserved:kube-apiserver\"}  blacklist:{destination_label:\"k8s:k8s-app=kube-dns\"  destination_port:\"53\"}  blacklist:{source_fqdn:\"*.cluster.local*\"}  blacklist:{destination_fqdn:\"*.cluster.local*\"}  blacklist:{protocol:\"ICMPv4\"}  blacklist:{protocol:\"ICMPv6\"}  whitelist:{source_pod:\"default/\"  event_type:{type:1}  event_type:{type:4}  event_type:{type:129}  reply:false}  whitelist:{destination_pod:\"default/\"  event_type:{type:1}  event_type:{type:4}  event_type:{type:129}  reply:false}" subsys=ui-backend
level=info msg="running hubble status checker\n" subsys=ui-backend
level=info msg="fetching hubble flows: connecting to hubble-relay (attempt #1)\n" subsys=ui-backend
level=info msg="hubble-relay grpc client created (hubble-relay addr: hubble-relay:80)\n" subsys=ui-backend
level=info msg="hubble-relay grpc client created (hubble-relay addr: hubble-relay:80)\n" subsys=ui-backend
level=info msg="hubble status checker: connection to hubble-relay established\n" subsys=ui-backend
level=info msg="fetching hubble flows: connection to hubble-relay established\n" subsys=ui-backend
level=info msg="fetching hubble flows: connecting to hubble-relay (attempt #1)\n" subsys=ui-backend
level=error msg="flow error: EOF\n" subsys=ui-backend
level=info msg="hubble status checker: stopped\n" subsys="ui-backend:status-checker"
level=info msg="hubble-relay grpc client created (hubble-relay addr: hubble-relay:80)\n" subsys=ui-backend
level=error msg="fetching hubble flows: connecting to hubble-relay (attempt #1) failed: rpc error: code = Canceled desc = context canceled\n" subsys=ui-backend
level=info msg="fetching hubble flows: stream (ui backend <-> hubble-relay) is closed\n" subsys=ui-backend

Hope this helps.

Kind regards,

Pascal

pascal71 avatar Oct 02 '22 18:10 pascal71

@pascal71 Your issue may be a different one. Would you mind opening a new issue?

rolinh avatar Oct 03 '22 07:10 rolinh

Get same error asin in first comment and the same logs as Pascal. Using cilium v1.11.8 and just downloaded latest hubble and cilium binaries today. Here is the relay:

level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:443"

hubble-ui has:

level=info msg="Get flows request: number:10000 follow:true blacklist:{source_label:\"reserved:unknown\" source_label:\"reserved:host\" source_label:\"k8s:k8s-app=kube-dns\" source_label:\"reserved:remote-node\" source_label:\"k8s:app=prometheus\" source_label:\"reserved:kube-apiserv
level=info msg="running hubble status checker\n" subsys=ui-backend                                                                                                                                                                                                                          
level=info msg="fetching hubble flows: connecting to hubble-relay (attempt #1)\n" subsys=ui-backend                                                                                                                                                                                         
level=info msg="hubble-relay grpc client created (hubble-relay addr: hubble-relay:80)\n" subsys=ui-backend                                                                                                                                                                                  
level=info msg="hubble status checker: connection to hubble-relay established\n" subsys=ui-backend                                                                                                                                                                                          
level=info msg="hubble-relay grpc client created (hubble-relay addr: hubble-relay:80)\n" subsys=ui-backend                                                                                                                                                                                  
level=info msg="fetching hubble flows: connection to hubble-relay established\n" subsys=ui-backend                                                                                                                                                                                          
level=info msg="fetching hubble flows: connecting to hubble-relay (attempt #1)\n" subsys=ui-backend                                                                                                                                                                                         
level=error msg="flow error: EOF\n" subsys=ui-backend                                                                                                                                                                                                                                       
level=info msg="hubble status checker: stopped\n" subsys="ui-backend:status-checker"                                                                                                                                                                                                        
level=info msg="hubble-relay grpc client created (hubble-relay addr: hubble-relay:80)\n" subsys=ui-backend                                                                                                                                                                                  
level=error msg="fetching hubble flows: connecting to hubble-relay (attempt #1) failed: rpc error: code = Canceled desc = context canceled\n" subsys=ui-backend                                                                                                                             
level=info msg="fetching hubble flows: stream (ui backend <-> hubble-relay) is closed\n" subsys=ui-backend   

FYI: I am using cillium since quite some time and each time I update I am trying it and it did not work a single time. The errors are getting less, but it would be useful to actually get back to these reports and suggest how users could help to actually get this to work.

ensonic avatar Oct 21 '22 14:10 ensonic

Hi, I have the same issue on my k8s env, have you found a fix to resole this issue ?

s-reynier avatar Dec 02 '22 16:12 s-reynier

Same issue here.

brotherdust avatar Jan 04 '23 22:01 brotherdust

Hello,

Same issue on a rke2 cluster

  • conf:

kind: HelmChartConfig metadata: name: rke2-cilium namespace: kube-system spec: valuesContent: |- hubble: listenAddress: ":4245" enabled: true metrics: enabled: - dns:query;ignoreAAAA - drop - tcp - flow - port-distribution - icmp - http peerService: clusterDomain: cluster.local relay: enabled: true ui: enabled: true tls: enabled: false

  • Cilium status: Defaulted container "cilium-agent" out of: cilium-agent, install-portmap-cni-plugin (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init) KVStore: Ok Disabled Kubernetes: Ok 1.23 (v1.23.14+rke2r1) [linux/amd64] Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"] KubeProxyReplacement: Disabled
    Host firewall: Disabled CNI Chaining: portmap Cilium: Ok 1.12.3 (v1.12.3-1c466d2) NodeMonitor: Listening for events on 2 CPUs with 64x4096 of shared memory Cilium health daemon: Ok
    IPAM: IPv4: 7/254 allocated from 10.42.0.0/24, BandwidthManager: Disabled Host Routing: Legacy Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled] Controller Status: 38/38 healthy Proxy Status: OK, ip 10.42.0.152, 0 redirects active on ports 10000-20000 Global Identity Range: min 256, max 65535 Hubble: Ok Current/Max Flows: 4095/4095 (100.00%), Flows/s: 505.43 Metrics: Ok Encryption: Disabled Cluster health: 3/3 reachable (2023-01-05T13:48:39Z)

  • logs on hubble-relay:

level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:80"

  • logs on hubble GUI:

Data stream has failed on the UI backend: EOF

  • Port 4244 is open on nodes

jvizier avatar Jan 05 '23 13:01 jvizier

Hello,

Same issue on a rke2 cluster

  • conf:

kind: HelmChartConfig

metadata:

name: rke2-cilium

namespace: kube-system

spec:

valuesContent: |-

hubble:

  listenAddress: ":4245"

  enabled: true

  metrics:

    enabled:

    - dns:query;ignoreAAAA

    - drop

    - tcp

    - flow

    - port-distribution

    - icmp

    - http

  peerService:

    clusterDomain: cluster.local

  relay:

    enabled: true

  ui:

    enabled: true

  tls:

    enabled: false
  • Cilium status:

Defaulted container "cilium-agent" out of: cilium-agent, install-portmap-cni-plugin (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)

KVStore: Ok Disabled

Kubernetes: Ok 1.23 (v1.23.14+rke2r1) [linux/amd64]

Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]

KubeProxyReplacement: Disabled

Host firewall: Disabled

CNI Chaining: portmap

Cilium: Ok 1.12.3 (v1.12.3-1c466d2)

NodeMonitor: Listening for events on 2 CPUs with 64x4096 of shared memory

Cilium health daemon: Ok

IPAM: IPv4: 7/254 allocated from 10.42.0.0/24,

BandwidthManager: Disabled

Host Routing: Legacy

Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]

Controller Status: 38/38 healthy

Proxy Status: OK, ip 10.42.0.152, 0 redirects active on ports 10000-20000

Global Identity Range: min 256, max 65535

Hubble: Ok Current/Max Flows: 4095/4095 (100.00%), Flows/s: 505.43 Metrics: Ok

Encryption: Disabled

Cluster health: 3/3 reachable (2023-01-05T13:48:39Z)

  • logs on hubble-relay:

level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local:80"

  • logs on hubble GUI:

Data stream has failed on the UI backend: EOF

  • Port 4244 is open on nodes

I think it's related to TLS (surprise!). I turned it off completely on relay, ui, and cilium configs and it started working.

brotherdust avatar Jan 05 '23 14:01 brotherdust

I think it's related to TLS (surprise!). I turned it off completely on relay, ui, and cilium configs and it started working.

Nice ! i already disable TLS but only on hubble side, i will looking on the others parts, Thanks

jvizier avatar Jan 05 '23 14:01 jvizier

I get the exact same issue, was also able to fix it temporary by disabling TLS which seems like a bad idea.

ellakk avatar Feb 05 '23 11:02 ellakk

I get the exact same issue

eramax avatar Feb 05 '23 21:02 eramax

Any updates? I went into exactly the same.

miathedev avatar Mar 26 '23 09:03 miathedev

This looks like not purely hubble-ui issue, but hubble/cilium issue. Usually it indicates, for example, when cilium installation was happened via helm, but hubble was enabled with cilium-cli. I would suggest to open new issue in https://github.com/cilium/cilium with detailed description how things were deployed. Please ref this issue there.

geakstr avatar Mar 27 '23 01:03 geakstr

this also happens when cilium and hubble were both installed at the same time using helm .. so this does not need a new issue.

a1git avatar May 18 '23 12:05 a1git

If you have enabled Traefik dashboard, try to disable it.

antonkobylko1990 avatar May 22 '23 18:05 antonkobylko1990

Same issue here, RKE2 with Cilium.

ulfaric avatar May 23 '23 16:05 ulfaric

Has anyone found a nice way to resolve this issue without removing and reinstalling a Helm chart?

pkoraca avatar May 24 '23 07:05 pkoraca

Have the same issue with Cilium 1.13 3 on Upstream K8s. Everything is installed with Helm. Disabling TLS fixed it also for me.

  tls:
    enabled: false

camrossi avatar Jun 01 '23 05:06 camrossi

In my case the same error was happening with httpV2 enabled. Removing that line fixed the issue.

    metrics:
      serviceMonitor:
        enabled: true
      enableOpenMetrics: true
      enabled:
      - dns:query;ignoreAAAA
      - drop
      - tcp
      - flow
      - port-distribution
      - icmp
      - http
      # - httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction

I managed to reproduce the Hubble CLI issue as well, but could not fix it by disabling TLS (example below). Chart reinstall helped though.

This did not help:

hubble:
  relay:
    tls:
      server:
        enabled: false
  tls:
    enabled: false

pkoraca avatar Jun 01 '23 11:06 pkoraca

For me it was because my cluster domain is "cluster" (imperative for Cilium to not have doted cluster domain) BUT helm chart define by default hubble.peerService.clusterDomain to "cluster.local".

With Cilium installed with helm on 1.13.3, set correct hubble.peerService.clusterDomain value fix access to UI for me and I didn't need to disable TLS anywhere.

My Cilium values:

helm install cilium cilium/cilium --version 1.13.3 \
  --namespace kube-system \
  --set ipam.mode=cluster-pool \
  --set ipam.operator.clusterPoolIPv4PodCIDRList=10.66.0.0/16 \
  --set ipam.operator.clusterPoolIPv4MaskSize=20 \
  --set kubeProxyReplacement=strict \
  --set k8sServiceHost=172.16.66.200 \
  --set k8sServicePort=6443 \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set operator.replicas=1 \
  --set tunnel=disabled \
  --set ipv4NativeRoutingCIDR=10.66.0.0/16 \
  --set autoDirectNodeRoutes=true \
  --set hubble.peerService.clusterDomain=cluster

samos667 avatar Jun 07 '23 23:06 samos667

In my case, the following configuration alone was not enough, the communication from hubble-relay to hubble-peer was failing due to Ubuntu's ufw.

I allowed access from Cilium's IP CIDR and it worked fine.

hubble:
  relay:
    tls:
      server:
        enabled: false
  tls:
    enabled: false

morix1500 avatar Jul 24 '23 03:07 morix1500

you must visit http://localhost:12000/ because ui seems forbid outside ip, are you visit localhost?

V0idk avatar Jul 24 '23 17:07 V0idk

The CLI simply doesn't support TLS being disabled.

When the following flags are issued:

cilium hubble enable --ui --helm-set hubble.tls.enabled=false --helm-set hubble.tls.auto.enabled=false --helm-set hubble.relay.tls.server.enabled=false
  1. It causes the relay Secret not to be generated. This is what we want.

  2. Secret creation is forced in the CLI regardless of (1).

func (k *K8sHubble) enableRelay(ctx context.Context) (string, error) {
        ...
	k.Log("✨ Generating certificates...")

	if err := k.createRelayCertificates(ctx); err != nil {
		return "", err
	}
        ...
}

func (k *K8sHubble) createRelayCertificates(ctx context.Context) error {
	k.Log("🔑 Generating certificates for Relay...")
	...
	return k.createRelayClientCertificate(ctx)
}

func (k *K8sHubble) createRelayClientCertificate(ctx context.Context) error {
        secret, err := k.generateRelayCertificate(defaults.RelayClientSecretName)
        if err != nil {
                return err
        }

        _, err = k.client.CreateSecret(ctx, secret.GetNamespace(), &secret, metav1.CreateOptions{})
        if err != nil {
                return fmt.Errorf("unable to create secret %s/%s: %w", secret.GetNamespace(), secret.GetName(), err)
        }

        return nil
}

secret is empty because of (1).

k.client.CreateSecret fails because it's called with empty "payload" (the empty secret).

ervinb avatar Aug 18 '23 10:08 ervinb

For people who would like to enable httpV2 hubble metric. Try to remove \ in the labelsContext separator

I think the document was typo


    metrics:
      serviceMonitor:
        enabled: true
      enableOpenMetrics: true
      enabled:
      - dns:query;ignoreAAAA
      - drop
      - tcp
      - flow
      - port-distribution
      - icmp
      - http
      - httpV2:exemplars=true;labelsContext=source_ip,source_namespace,source_workload,destination_ip,destination_namespace,destination_workload,traffic_direction

lllsJllsJ avatar Aug 29 '23 07:08 lllsJllsJ