grpc-spring icon indicating copy to clipboard operation
grpc-spring copied to clipboard

Provide a gRPC server health check

Open asarkar opened this issue 5 years ago • 12 comments

The problem

Monitor the health of a gRPC server app.

The solution

Provide a gRPC server HealthEndpoint. My blog post on this matter may be useful. https://blog.asarkar.com/technical/grpc-kubernetes-spring/

Alternatives considered

Implement myself.

Additional context

Also note that starting with version 2.3.0.RELEASE, Spring Boot provides liveness and readiness information under Actuator health endpoint. See Kubernetes Probes for details.

asarkar avatar Sep 28 '20 14:09 asarkar

I'm not sure what you are exactly asking for.

  • Export some kind of HealthIndicator for the grpc-server to spring actuator,
  • or use the HealthIndicators to populate a grpcHealthEndpoint call/service that mirrors the web endpoint,
  • or use the HealthIndicators to populate the HealthStatusManager

ST-DDT avatar Sep 28 '20 14:09 ST-DDT

Export some kind of HealthIndicator for the grpc-server to spring actuator

This. I've a Kotlin implementation that I can share, which I'm sure can be easily turned into Java.

asarkar avatar Sep 28 '20 15:09 asarkar

I'm not aware of any useful grpc-server availability indicator, so I'm very interested in your implementation. Can you link it here? I will decide, once I have seen the implementation.

EDIT: AFAIK the default http server doesn't have one either.

ST-DDT avatar Sep 28 '20 15:09 ST-DDT

open class GrpcServerHealthIndicator internal constructor(private val healthChannel: ManagedChannel) : HealthIndicator {
    internal constructor(port: Int, channelBuilderClass: String) : this(newChannel(port, channelBuilderClass))

    private val healthStub: HealthGrpc.HealthBlockingStub = HealthGrpc.newBlockingStub(healthChannel)

    private var availabilityChangeEventPublishMethod: Method? = null
    private var publishAvailabilityChangeEvent: Boolean = true
    private var readinessStateRefusingTraffic: Enum<*>? = null
    private var livenessStateBroken: Enum<*>? = null

    @Autowired
    lateinit var context: ApplicationContext

    @PostConstruct
    open fun postConstruct() {
        publishAvailabilityChangeEvent = context.containsBean("livenessStateHealthIndicator") &&
                context.containsBean("readinessStateHealthIndicator")
        if (publishAvailabilityChangeEvent) {
            log.info("Kubernetes probes are enabled")
        } else {
            log.info("Kubernetes probes not enabled")
            return
        }

        try {
            val availabilityChangeEventClass =
                Class.forName("org.springframework.boot.availability.AvailabilityChangeEvent")
            val availabilityStateClass =
                Class.forName("org.springframework.boot.availability.AvailabilityState")
            availabilityChangeEventPublishMethod = ReflectionUtils.findMethod(
                availabilityChangeEventClass, "publish",
                ApplicationContext::class.java, availabilityStateClass
            )
            val readinessStateClass = Class.forName("org.springframework.boot.availability.ReadinessState")
            readinessStateRefusingTraffic = readinessStateClass.enumConstants
                .map { it as Enum<*> }
                .firstOrNull { it.name == "REFUSING_TRAFFIC" }
            val livenessStateClass = Class.forName("org.springframework.boot.availability.LivenessState")
            livenessStateBroken = livenessStateClass.enumConstants
                .map { it as Enum<*> }
                .firstOrNull { it.name == "BROKEN" }
            publishAvailabilityChangeEvent = true
        } catch (ex: ReflectiveOperationException) {
            publishAvailabilityChangeEvent = false
            log.error(ex.message, ex)
        }
    }

    @PreDestroy
    open fun preDestroy() {
        healthChannel.shutdown()
    }

    // c.f. org.springframework.boot.actuate.availability package
    override fun health(): Health {
        val request = HealthCheckRequest.getDefaultInstance()
        val builder = Health.Builder()
        try {
            val response = healthStub.check(request)
            when (response.status) {
                HealthCheckResponse.ServingStatus.SERVING -> builder.up()
                HealthCheckResponse.ServingStatus.NOT_SERVING -> builder.outOfService()
                else -> builder.down()
            }
        } catch (ex: Exception) {
            builder.down(ex)
        }
        val health = builder.build()

        if (publishAvailabilityChangeEvent) {
            if (health.status == Status.OUT_OF_SERVICE) availabilityChangeEventPublishMethod?.invoke(
                null,
                context,
                readinessStateRefusingTraffic
            )
            else if (health.status != Status.UP) availabilityChangeEventPublishMethod?.invoke(
                null,
                context,
                livenessStateBroken
            )
        }

        return health
    }

    companion object {
        private val log: Logger = LoggerFactory.getLogger(GrpcServerHealthIndicator::class.java)

        private fun newChannel(port: Int, channelBuilderClass: String): ManagedChannel {
            val forAddressMethod = ReflectionUtils.findMethod(
                Class.forName(channelBuilderClass),
                "forAddress",
                String::class.java, Int::class.java
            )
            check(forAddressMethod != null) { "Could not find NettyChannelBuilder.forAddress(String, int) method" }

            var nettyChannelBuilder = forAddressMethod.invoke(null, "localhost", port)

            val usePlaintextMethod = ReflectionUtils.findMethod(
                nettyChannelBuilder.javaClass,
                "usePlaintext"
            )
            check(usePlaintextMethod != null) { "Could not find NettyChannelBuilder.usePlaintext() method" }
            nettyChannelBuilder = usePlaintextMethod.invoke(nettyChannelBuilder)

            val buildMethod = ReflectionUtils.findMethod(
                nettyChannelBuilder.javaClass,
                "build"
            )
            check(buildMethod != null) { "Could not find NettyChannelBuilder.build() method" }

            val channel = buildMethod.invoke(nettyChannelBuilder)
            return channel as ManagedChannel
        }
    }
}

asarkar avatar Sep 28 '20 15:09 asarkar

AFAICT this code creates a HealthStub that connects to the own server to query the health service. I'm not sure which HealthService is called (no imports), but the return value doesn't contain any additional value except for "a call was successful". The actual response value could (theoretically) be directly called via code without the network io.

I'll have a look at other server libraries whether they implement this kind of ping HealthIndicator.

As for publishing the availabilityChangeEvent, this is up to the custom user code. This library should not decide whether the application is broken/down/unavailable. The user however might use any existing HealthIndicator for this:

management.endpoint.health.group.liveness.include=livenessProbe,grpcServerHealthIndicator

ST-DDT avatar Sep 29 '20 17:09 ST-DDT

which HealthService is called

The one available viaHeathStatusManager.getHealthService() and described here. I don't register it, so grpc-spring-boot-starter must be doing so.

The actual response value could (theoretically) be directly called via code without the network io.

I'm not sure I understand how, could you elaborate?

publishing the availabilityChangeEvent

After speaking with the Spring Boot team on Twitter, it appears that the intended design is not to update application availability from inside an HealthIndicator, but instead add the health indicator to the liveness and/or readiness groups.

asarkar avatar Sep 29 '20 19:09 asarkar

which HealthService is called

The one available viaHeathStatusManager.getHealthService() and described here. I don't register it, so grpc-spring-boot-starter must be doing so.

The actual response value could (theoretically) be directly called via code without the network io.

I'm not sure I understand how, could you elaborate?

There is a bean for that. Fun fact: That bean is never populated with actual health data unless you do it yourself (like you would do for spring). (Aside from some trivial startup and shutdown states)

The bean/service might be removed in a future release, because it doesn't add any value by itself (or I might replace it with a bridge to actuator).

ST-DDT avatar Sep 29 '20 19:09 ST-DDT

So is there anything left to do here?

ST-DDT avatar Sep 29 '20 19:09 ST-DDT

So is there anything left to do here?

Didn't you say you are going to look into adding a HealthIndicator as shown?

asarkar avatar Sep 29 '20 19:09 asarkar

Yes, I did. Sorry, you are right.

TODO: Check whether the web server has a "self ping" health indicator and implement one for grpc as well.

ST-DDT avatar Sep 29 '20 20:09 ST-DDT

There is a bean for that.

That I noticed, but the HealthStatusManager has a reference to the Health service, it is not the Health service. When I tried to register the Health service, I got a duplicate service error. There's some code that registers the Health service, and it's not the HealthStatusManager. Just to be clear, I'm happy someone else did it for me, but I'm just curious where it's done.

asarkar avatar Sep 29 '20 20:09 asarkar

https://github.com/yidongnan/grpc-spring-boot-starter/blob/master/grpc-server-spring-boot-autoconfigure/src/main/java/net/devh/boot/grpc/server/serverfactory/AbstractGrpcServerFactory.java#L115

ST-DDT avatar Sep 29 '20 20:09 ST-DDT