hyades icon indicating copy to clipboard operation
hyades copied to clipboard

Automatically pause Kafka consumers when database is down

Open nscuro opened this issue 2 years ago • 1 comments

Currently, when either API server or Hyades services are unable to connect to the database, the database health check will switch to DOWN state. When deployed to Kubernetes, this will cause the respective pods to be removed from receiving incoming HTTP traffic.

However, Kafka consumers (Kafka Streams, Parallel Consumer) will continue to receive records from the Kafka broker. In most cases, processing these records requires database access. While the database is unreachable, all processing is doomed to fail. It would be preferable to stop consuming during the database outage, and resume once it's available again.

Kafka Streams and Confluent Parallel Consumers can both be paused and resumed:

  • https://kafka.apache.org/36/javadoc/org/apache/kafka/streams/KafkaStreams.html#pause()
  • https://github.com/confluentinc/parallel-consumer/blob/075b4f91d7e10c905d3e7319166500c848dea3c8/parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/ParallelConsumer.java#L55-L74

We need a way to detect when the database health checks start to fail, and pause consumers when it happens. Similarly, consumers must be started again when the health checks recover.

nscuro avatar Nov 07 '23 11:11 nscuro

https://github.com/quarkusio/quarkus/discussions/36918

nscuro avatar Nov 08 '23 10:11 nscuro

Closing, as the API server was migrated to Confluent parallel-consumer, which provides us with HOL blocking semantics.

nscuro avatar Jun 05 '24 10:06 nscuro