zilla icon indicating copy to clipboard operation
zilla copied to clipboard

Add native support for gRPC health check

Open cagecurrent opened this issue 4 months ago • 3 comments

Is your feature request related to a problem? Please describe.

I'm trying to setup a gRPC proxy as a front-end to our Kafka server to be able to access it using gRPC (which we already widely use). This Zilla-based proxy I will deploy in a Kubernetes cluster. This cluster however requires me to have a health end-point, and in the case of being configured for gRPC it needs to implement the standard gRPC health check protocol. And if my proxy does not answer with SERVING the pod will be taken down.

When running locally I use the grpc-health-probe tool, and I believe this is fully compatible with how the Kubernetes gRPC health-check works.

https://github.com/grpc-ecosystem/grpc-health-probe

The incoming request looks like this:

/grpc.health.v1.Health/Check

The messages for the request and response in the proto looks like this:

message HealthCheckRequest {
  string service = 1;
}

message HealthCheckResponse {
  enum ServingStatus {
    UNKNOWN = 0;
    SERVING = 1;
    NOT_SERVING = 2;
    SERVICE_UNKNOWN = 3;  // Used only by the Watch method.
  }
  ServingStatus status = 1;
}

Describe the solution you'd like

Ideally I would like to see native, built-in support for the gRPC health-check. For me it's enough to have it statically answer SERVING as long as Zilla is running properly, but others might need to be able to probe some of the upstream services and based on that respond accordingly.

Describe alternatives you've considered

An alternative solution would be that I could add a "direct response" value directly in the zilla.yaml file, telling Zilla to reply with SERVING. Something similar to config snippets below.

north_http_server:
    type: http
    kind: server
    routes:
      - when:
          - headers:
              :path: /grpc.health.v1.Health/*
        exit: north_health_server
north_health_server:
   type: direct_response   
   kind: server
   options:
      response:
          value: 'SERVING'
          code: 200

Having a way to specify direct response in a straight-forward way like this would actually be really useful. But maybe it can already be accomplished using catalogs?!

cagecurrent avatar Jul 04 '25 12:07 cagecurrent

@cagecurrent we have first class support for gRPC in the grpc binding, so that seems like a natural place to potentially add support for the standard grpc.health.v1 service.

For example:

north_http_server:
  type: http
  kind: server
  exit: north_grpc_server
north_grpc_server:
  type: grpc   
  kind: server
  options:
    services:
      - grpc.health.v1
  exit: ...

This design aligns with the proposed approach for the grpc.reflection.v1 enhancement, see https://github.com/aklivity/zilla/issues/954

Please confirm that this would work for your scenario.

jfallows avatar Jul 20 '25 23:07 jfallows

@jfallows, nice to see that you are working on this. Your solution seems ok, although I'm a bit too new to Zilla to be able to give any nuance in my response. As I mentioned, for me the most straight-forward way the health check to work is that it just say that Zilla is running. I could see another scenario, however, where you want to probe some other service and base the health-check response on that.

I also tried to get Zilla just proxy the gRPC health-check request to another service (a "dummy" program that was running in the same docker container, on a different port) to get status response. But I didn't get that to work, but that might have been more due to my limited Zilla knowledge and experience.

cagecurrent avatar Jul 21 '25 05:07 cagecurrent

I would like to work on this ticket.

Qianyu2021 avatar Aug 15 '25 21:08 Qianyu2021