spring-boot icon indicating copy to clipboard operation
spring-boot copied to clipboard

Retain default group membership when configuring other properties on probe health groups

Open tharakadesilva opened this issue 10 months ago • 5 comments

When configuring additional paths for Spring Boot actuator health groups ('readiness', 'liveness', etc.), unrelated health indicators to those groups unexpectedly start failing. This behavior presents risks for production environments, potentially leading to unnecessary restarts or pod terminations.

Impact:

This issue introduces the risk of unnecessary pod restarts or service disruptions within environments using health endpoints for availability checks (e.g., Kubernetes). A failing health indicator on one path could lead to cascading failures within liveness or readiness probes.

Steps to reproduce:

  1. Bootstrap a new project with SpringBoot 3.2.4.

  2. Create a custom health indicator that always reports "DOWN":

    @Bean
    HealthIndicator myHealthIndicator() {
      return () -> Health.down().build();
    }
    
  3. Add an additional paths to the readiness and liveness groups:

    management.endpoint.health.group.readiness.additional-path=server:/readyz
    management.endpoint.health.group.liveness.additional-path=server:/livez
    

Expected behavior (and this works without the additional path configs in step (3)):

  • /actuator/health - should fail
  • /actuator/health/readiness - should pass
  • /readyz - should pass
  • /actuator/health/liveness - should pass
  • /livez - should pass

Actual behavior:

  • /actuator/health - fails
  • /actuator/health/readiness - fails (unexpectedly)
  • /readyz - fails (unexpectedly)
  • /actuator/health/liveness - fails (unexpectedly)
  • /livez - fails (unexpectedly)

I would have expected readiness and livenss to have failed if I only set the following props:

management.endpoint.health.group.readiness.include=my
management.endpoint.health.group.liveness.include=my

Demo Project (see tests):

demo.zip

Potential Workaround (Temporary):

Avoid using additional-path on health groups where this side-effect could be disruptive.

tharakadesilva avatar Apr 09 '24 12:04 tharakadesilva

This is a little confusing, but what's happening here is when you declare the following:

management.endpoint.health.group.readiness.additional-path=server:/readyz
management.endpoint.health.group.liveness.additional-path=server:/livez

Spring Boot is creating two new groups called readiness and liveness which by default include all health indicators. If you want to only include the liveness and readiness probes you have two options. You can either set the following property to create /readyz and /livez additional paths:

management.endpoint.health.probes.add-additional-paths=true

or you can update the properties to include the correct health indicators:

management.endpoint.health.group.readiness.include=readinessState
management.endpoint.health.group.readiness.additional-path=server:/readyz
management.endpoint.health.group.liveness.include=livenessState
management.endpoint.health.group.liveness.additional-path=server:/livez

philwebb avatar Apr 13 '24 19:04 philwebb

Flagging for a team discussion since I wonder if we should do more to improve things. Perhaps if management.endpoint.health.probes.enabled=true is set and management.endpoint.health.group.liveness doesn't change the include or exclude we should keep the probe defaults.

philwebb avatar Apr 13 '24 19:04 philwebb

We're going to look to see if we can improve the default behavior so that liveness and readiness groups have sensible memberships unless the user has specifically configured them otherwise. This is a breaking change so we can't consider it a bug.

philwebb avatar Apr 17 '24 15:04 philwebb

Thanks @philwebb!! I've applied the workaround (1) that you recommended and it is working as expected, thank you.

tharakadesilva avatar Apr 17 '24 18:04 tharakadesilva

In many of Josh Long's articles and videos, he recommends this pattern as an option to conceal the actuator endpoints behind a different port. He makes no mention of this unexpected behavior and has definitely been a huge gotcha for our team..

matthenry87 avatar Sep 05 '24 16:09 matthenry87