spring-boot icon indicating copy to clipboard operation
spring-boot copied to clipboard

Actuator configuration for endpoint exposure is overly complicated

Open ThomasKasene opened this issue 1 year ago • 3 comments

Hello 👋 At the risk of sounding a bit slow, I'd like to raise awareness of something that's bothering me a little: The state of the management.* configuration properties, and specifically the ones used to decide which endpoints are being made available.

Consider the following case: I'm running my app in Kubernetes and use Prometheus to scrape its metrics, and I want it to expose only the following actuator endpoints:

  • GET /actuator/health/liveness
  • GET /actuator/health/readiness
  • GET /actuator/prometheus

I've been trying to make this as compact as possible, and maybe there's a magic combination of properties and values that I've missed, but this is where I'm currently at:

management:
  endpoints:
    enabled-by-default: false # fair enough
    web.exposure.include: health,prometheus # expose the two endpoints over HTTP
  endpoint:
    health:
      enabled: true # required, otherwise its sub-endpoints (liveness and readiness) won't work, for some reason
      probes.enabled: true # fair enough
    prometheus.enabled: true # fair enough
  prometheus.metrics.export.enabled: true # I don't know what this actually controls, but it's necessary

If I understand the documentation correctly, an endpoint has two attributes, both of which contribute to whether or not it's actually available: enabled|disabled and exposed|hidden. I think this is the root cause of most of the confusion I'm dealing with when trying to understand this system.

I know it's a bold proposition, but it's late and I'm sleepy so here goes: Apart from backwards compatibility, is there any particular reason why it isn't structured more plainly, like in the following configuration instead? Have any use-cases that go beyond my very limited one made it necessary, for example?

management:
  endpoints.enabled-by-default: false
  web.endpoint:
    health.probes.enabled: true
    prometheus.enabled: true
  prometheus.metrics.export.enabled: true # I still don't understand what this does, but leaving it in just for laughs

ThomasKasene avatar Feb 15 '24 22:02 ThomasKasene

Endpoints can be enabled or disabled, and additionally can be exposed over HTTP and/or JMX. All endpoints are enabled by default, except the shutdown one (documentation). The only endpoint which is by default exposed (both on web and JMX) is the health one (documentation).

If I understand correctly, your goal is to have health (with both liveness and readiness groups) and prometheus accessible over http. The configuration to do that is:

management:
  endpoints:
    web.exposure.include: health, prometheus

The rest is not necessary, as the health and the prometheus endpoint is enabled by default. health.probes.enabled: true isn't needed, as documented here, they are automatically enabled if kubernetes is detected.

management.prometheus.metrics.export.enabled don't have to explicitly set to true, as this is it's default value.

Does that make things more clear?

mhalbritter avatar Feb 16 '24 09:02 mhalbritter

Thanks for your answer and suggested solution!

All other endpoints except health and prometheus are still enabled, hence the need to disable them all first. Of course, I still don't know what it means that an endpoint is "enabled", but my intuition tells me that something is still running behind the scenes and eats up resources unnecessary, even if the end result isn't exposed. Besides, is it really an endpoint if it cannot be reached somehow?

I actually don't care about the /actuator/health "root" endpoint. I just need the liveness and readiness endpoints to exist. But I do still want them to exist outside of Kubernetes because I want my automated tests to reach them during build time to verify that I haven't accidentally hidden them behind Spring Security, for example.

I suppose I could omit setting management.prometheus.metrics.export.enabled, but its existence adds to the complexity of the overall system.

By the way, while I appreciate the solution, this issue is mostly an effort to illustrate that some users (or at least one 😁) find the current scheme a little confusing and daunting to work with. 😃

ThomasKasene avatar Feb 16 '24 18:02 ThomasKasene

Of course, I still don't know what it means that an endpoint is "enabled", but my intuition tells me that something is still running behind the scenes and eats up resources unnecessary, even if the end result isn't exposed.

I don't think this is the case. We have a condition that checks that an endpoint is both enabled and exposed to create its infrastructure.

See https://github.com/spring-projects/spring-boot/blob/main/spring-boot-project/spring-boot-actuator-autoconfigure/src/main/java/org/springframework/boot/actuate/autoconfigure/endpoint/condition/OnAvailableEndpointCondition.java

bclozel avatar Feb 16 '24 18:02 bclozel

All other endpoints except health and prometheus are still enabled, hence the need to disable them all first. Of course, I still don't know what it means that an endpoint is "enabled", but my intuition tells me that something is still running behind the scenes and eats up resources unnecessary, even if the end result isn't exposed. Besides, is it really an endpoint if it cannot be reached somehow?

Yes, endpoints are still "running" - meaning there is a bean in the context if the endpoint is enabled, regardless if it's exposed or not.

The actuator system is built as a layered system. The endpoint beans are created if they are enabled. You could for example do something like this:

@Component
class CLR implements CommandLineRunner {
    private final InfoEndpoint infoEndpoint;

    CLR(InfoEndpoint infoEndpoint) {
        this.infoEndpoint = infoEndpoint;

    }

    @Override
    public void run(String... args) {
        System.out.println(this.infoEndpoint.info());
    }
}

This works if the info endpoint is enabled, and it works regardless if it is exposed or not. So you could, for example, write your own transport mechanism for the endpoints.

Then Spring Boot defines two default ways to expose an endpoint: http and JMX, which can be configured separately.

In your case, if you really want to only have the endpoints as beans which are exposed and to enable the health groups regardless of kubernetes, your minimal config is this:

management:
  endpoints:
    enabled-by-default: false
    web.exposure.include: health, prometheus
  endpoint:
    health:
      enabled: true
      probes.enabled: true
    prometheus.enabled: true

But I wouldn't describe it as "overly complicated", this config results by your choice to optimize the number of created beans and to expose health groups for your test.

Is there anything we can/should do in your opinion?

mhalbritter avatar Feb 19 '24 08:02 mhalbritter

That's what I said is not correct. Endpoint beans are only in the context when they are exposed over JMX or web (or CloudFoundry). That what Brian said is correct - in my test project IntelliJ was fiddling with my system properties by helpfully including -Dspring.jmx.enabled=true and -Dmanagement.endpoints.jmx.exposure.include=* to the java command.

mhalbritter avatar Feb 19 '24 08:02 mhalbritter

So this config:

management:
  endpoints:
    web.exposure.include: health, prometheus
  endpoint:
    health:
      probes.enabled: true

Is really the minimal config for the usecase you have: it exposed health (and subgroups regaredless of running in k8s or not) and prometheus.

Endpoints which are not exposed (like info, etc.) are not created and not "running". So there's no need to fiddle with the enabled properties.

mhalbritter avatar Feb 19 '24 08:02 mhalbritter

Thanks for your replies! If I understand you both correctly, there are conditions in place to prevent bean creation unless the endpoint in question is both enabled and exposed. Like I said initially, maybe I'm just slow, but I still don't fully understand why "exposed" does not equal to "available" 😅 Why is there an additional "enable" flag which does essentially the same thing?

It's starting to dawn on me that maybe it's in order to make feature toggling simpler. I daresay it's easier with a boolean flag than juggling different combinations of endpoint names, i.e. in the management.endpoints.web.exposure.include property. But if that's the only reason, I think I'd still prefer to define the exposure on a per-endpoint basis:

management:
  endpoint:
    health:
      exposure: http,jmx
      probes.enabled: true
    prometheus:
      exposure: http
      enabled: ${PROMETHEUS_ENABLED} # feature toggle based on env variable, etc

I dunno, I just it just makes more sense to me that all the properties are grouped together where they belong, rather than having some here and some there 😃

ThomasKasene avatar Feb 19 '24 19:02 ThomasKasene

I don't know the exact reasons why the enabled flag has been introduced, but I guess there's a reason to it.

But let's say we didn't have the enabled flag, and you would do a

management:
  endpoints:
    web.exposure.include: *

then the shutdown endpoint would be exposed, and everyone could shutdown the application. Having to explicitly enable the shutdown endpoint (as it's disabled by default) solves that problem.

mhalbritter avatar Feb 20 '24 07:02 mhalbritter

enablement and exposure are different concepts and they were introduced separately for the reason Moritz explained. The "*" exposure value is often better than listing all possible values manually.

For example, our default setup enables all endpoints but shutdown and only exposes health. We could indeed switch that to:

management:
  endpoints:
    web.exposure:
      include: health
      exclude: shutdown

If we wanted to apply such changes, we could only warn users in logs and in IDE contextual information. There is a high change that any application still reliying shutdown not being enabled at all would expose it by accident when upgrading to the latest Spring Boot version. All apps overriding the exclude property would be impacted. Third party libraries can contribute endpoints, which even complicates things.

Given this assessment, I think we should leave things as they are and reconsider this issue if we get more similar feedback from the community. Thanks for your report!

bclozel avatar Mar 25 '24 09:03 bclozel