gatus icon indicating copy to clipboard operation
gatus copied to clipboard

ICMP not working on Kubernetes even if sysctl -w net.ipv4.ping_group_range="0 2147483647"

Open jerome-karabenli opened this issue 2 years ago • 12 comments

Describe the bug

Ping not working even if sysctl -w net.ipv4.ping_group_range="0 2147483647" is configured. net.ipv4.ping_group_range is set in initContainer which have privileged role an run as root.

I tried to mount an alpine container in same pod where gatus is. I can successfully use ping in this alpine container which have same securitycontext as gatus container, is running a non root user, on uid and guid 65534 (nobody)

I tried to ping google.com

Used config in config.yaml endpoints:

  • name: TEST enabled: true url: "icmp://google.com" interval: 60s conditions:
    • "[CONNECTED] == true" client: timeout: 30s

What do you see?

Endpoint is returning not ok

What do you expect to see?

Ping success

List the steps that must be taken to reproduce this issue

Use icmp on endpoint running in Kubernetes using this endpoints config endpoints:

  • name: TEST enabled: true url: "icmp://google.com" interval: 60s conditions:
    • "[CONNECTED] == true" client: timeout: 30s

Use initContainer with root privlege using alpine image and execute this command: sysctl -w net.ipv4.ping_group_range="0 2147483647"

Version

twinproduction/gatus:v5.7.0

Additional information

No response

jerome-karabenli avatar Mar 04 '24 15:03 jerome-karabenli

I'm seeing the same behavior. I toyed around with passing sysctls to the securityContext and the method described by @jerome-karabenli -- both are unable to ping outside the pod.

I've search existing issues (#633, #182, #105) and I'm wondering if I'm missing something,

kevin7s-io avatar Mar 06 '24 03:03 kevin7s-io

The issue is here https://github.com/TwiN/gatus/blob/master/client/client.go#L246

pinger.SetPrivileged(runtime.GOOS != "darwin")

This will set privileged to true on linux and need to use the privileged ping instead of the unprivileged one. See https://github.com/prometheus-community/pro-bing/blob/ac3b40f1f0a7438a429e9bf6f2bc2a94ba286e39/ping.go#L430

Linux and darwin both support NonPrivileged ping (https://pkg.go.dev/golang.org/x/net/icmp?utm_source=godoc#example-PacketConn-NonPrivilegedPing) so I would expect it to be safe to only filter for windows.

The change was made here: https://github.com/TwiN/gatus/commit/c423afb0bf87d0e1be2f73fec25b5199acf1aed7 for issue #132 but darwin supports non-privileged pings so the windows only condition should be okay.

h3mmy avatar Mar 17 '24 19:03 h3mmy

Feel free to make a PR if you think that'll fix it!

TwiN avatar Apr 11 '24 03:04 TwiN

I created #748 in an attempt to address it, but I would appreciate if somebody (either @jerome-karabenli, @kevin7s-io, @h3mmy, @heathcliff26 or anybody reading this) could test it on their end and report back on whether #748 fixed it.

I've just built a container image; if you'd like to try it, pull twinproduction/gatus:experimental.

Note that the image in question is only built for linux/amd64.

TwiN avatar Apr 28 '24 23:04 TwiN

Works on Windows, but not on my Kubernetes cluster, even with the following configuration on the pods

          securityContext:
            allowPrivilegeEscalation: true
            capabilities:
              add:
                - NET_RAW

https://github.com/influxdata/influxdata-docker/pull/550 and https://github.com/influxdata/influxdata-docker/issues/547 seems to have some information on what needs to be done to fix this.

Looking at https://github.com/containerd/containerd/issues/6924, perhaps this will be fixed automagically too for Kubernetes 🤔

TwiN avatar Apr 28 '24 23:04 TwiN

I have tested it with podman and the experimental image works when running as root, but not in rootless mode.

I also tested running v5.10.0 as root since i didn't before, but it did not work.

So i guess the fix works, but still needs to have some privileges set.

heathcliff26 avatar Apr 29 '24 06:04 heathcliff26

I'm currently experiencing this issue, where my config that was working in docker doesn't work in Kubernetes.

I tried a bunch of things, such as capabilities, and using the same SC I use in blackbox-exporter:

podSecurityContext:
  sysctls:
    - name: net.ipv4.ping_group_range
      value: "0 65536"

which also didn't work. I already have set

enable_unprivileged_ports = true
enable_unprivileged_icmp = true

and that doesn't appear to help either.

joryirving avatar May 30 '24 14:05 joryirving

I created #748 in an attempt to address it, but I would appreciate if somebody (either @jerome-karabenli, @kevin7s-io, @h3mmy, @heathcliff26 or anybody reading this) could test it on their end and report back on whether #748 fixed it.

I've just built a container image; if you'd like to try it, pull twinproduction/gatus:experimental.

Note that the image in question is only built for linux/amd64.

SO sorry I missed this. I went ahead an tested the branch in #748 and it works in my k3s cluster.

This is a link to my HelmRelease: https://github.com/h3mmy/bloopySphere/blob/96329ee8e913168f11198920db4cd0f758b1ea68/cluster/apps/monitoring/gatus/app/helm-release.yaml

Important bits:

  • container running as non-root
  • dropped ALL capabilities
  • disallow privilege escalation
  • I do have an annotation to set the sysctls

annotations: reloader.stakater.com/auto: "true" # https://github.com/prometheus-community/pro-bing#linux security.alpha.kubernetes.io/sysctls: net.ipv4.ping_group_range=0 2147483647

And the config I used as a test case: https://github.com/h3mmy/bloopySphere/blob/96329ee8e913168f11198920db4cd0f758b1ea68/cluster/apps/networking/traefik/external-services/nas-camelus.yaml


apiVersion: v1 kind: ConfigMap metadata: name: camelus-plexii-gatus-ep namespace: networking labels: gatus.io/enabled: "true" data: config.yaml: | endpoints: - name: camelus-plexii-ping group: infrastructure url: icmp://${NAS_ADDRESS} interval: 5m ui: hide-url: true hide-hostname: true conditions: - "[CONNECTED] == true" alerts: - type: discord

Let me know if you'd like me to try any different arrangements for different scenarios.

h3mmy avatar May 31 '24 01:05 h3mmy

The experimental image resolved it for me. I didn't need the annotation.

joryirving avatar May 31 '24 01:05 joryirving

The experimental image resolved it for me. I didn't need the annotation.

It may vary with host distribution and kernel security profiles. I'm not an expert though.

Do you have any security profiles enabled on your host? AppArmor, seccomp, SELinux, etc?

h3mmy avatar May 31 '24 02:05 h3mmy

I do not.

joryirving avatar May 31 '24 13:05 joryirving

The following pod security context (pod, not container) fixed the issue for me:

      securityContext:
        sysctls:
          - name: net.ipv4.ping_group_range
            value: 0 65536

Downside is that this fix is Kubernetes-specific, and releasing this as-is would break people deploying Gatus on Docker.

TwiN avatar Jul 01 '24 22:07 TwiN