cilium-cli icon indicating copy to clipboard operation
cilium-cli copied to clipboard

connectivity test: "pod-to-world" case fails with HTTP 403 response from cloudflare

Open waynr opened this issue 2 years ago • 0 comments

Bug report

My team's e2e tests (which include cilium connectivity tests) experience some flaky behavior around the pod-to-world test case that look like:

�  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null https://one.one.one.one:443" failed: command terminated with exit code 22
  ℹ️  curl output:
3  curl: (22) The requested URL returned error: 403
'10.244.2.92:41936 -> 1.0.0.1:443 = 403
  
B  📄 No flows recorded during action https-to-one-one-one-one-0
B  📄 No flows recorded during action https-to-one-one-one-one-0
�  [.] Action [no-policies/pod-to-world/https-to-one-one-one-one-index-0: cilium-test/client-6488dcf5d4-4fjhn (10.244.2.92) -> one-one-one-one-https-index (one.one.one.one:443)]
�  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null https://one.one.one.one:443/index.html" failed: command terminated with exit code 22
  ℹ️  curl output:
3  curl: (22) The requested URL returned error: 403
'10.244.2.92:45752 -> 1.1.1.1:443 = 403

I suspect the issue here is that the cloud provider (Digitalocean) worker node IPs get flagged as abusive by cloudflare and automatically blocked by their firewall. I can imagine two possible solutions here:

  • allow the "world" target to be configurable (https://github.com/cilium/cilium-cli/issues/222)
  • accept http 403 responses if it can be determined that this status is actually coming from a target outside the cluster (since to my understanding this should still validate connectivity between pod and world). (https://github.com/cilium/cilium-cli/issues/174)

General Information

  • Cilium CLI version (run cilium version): 0.9.1
  • Orchestration system version in use (e.g. kubectl version, ...): ?
  • Platform / infrastructure information (e.g. AWS / Azure / GCP, image / kernel versions): Digitalocean

How to reproduce the issue

  1. Create Digitalocean cluster: https://docs.digitalocean.com/products/kubernetes/quickstart/
  2. Get kubeconfig, set KUBECONFIG environment variable: https://docs.digitalocean.com/products/kubernetes/how-to/connect-to-cluster/
  3. Run Cilium 0.9.1 connectivity tests: cilium connectivity test

In order to reproduce this it may be necessary to repeatedly create and delete worker node pools until a worker node with an IP from the DO ASN that is flagged by cloudflare as abusive.

waynr avatar Mar 22 '22 22:03 waynr