shardcake icon indicating copy to clipboard operation
shardcake copied to clipboard

AWS Discovery

Open guersam opened this issue 2 years ago • 8 comments

Thanks for open-sourcing shardcake, @ghostdogpr and Devsisters!

I'd like to port a running akka cluster to shardcake, and the largest blocker is the service discovery.

We have an Akka cluster that is running on ECS Fargate instead of Kubernetes so that we're using ECS discovery module provided by Akka Management:

https://doc.akka.io/docs/akka-management/current/discovery/aws.html

guersam avatar Sep 08 '22 05:09 guersam

This can be done easily by providing an implementation of PodsHealth using ECS API, see https://devsisters.github.io/shardcake/docs/customization.html#health

The Kubernetes one is only a few lines long: https://github.com/devsisters/shardcake/blob/series/2.x/health-k8s/src/main/scala/com/devsisters/shardcake/K8sPodsHealth.scala

ghostdogpr avatar Sep 08 '22 06:09 ghostdogpr

Btw if someone implements it, we'll happily accept the contribution!

ghostdogpr avatar Sep 08 '22 06:09 ghostdogpr

Conceptual question about this: Are these infra-specific health-checks in any way superior the built-in "ping" health check? That one is also very reliable (right?), works out of the box and does not require to set up access permissions to infra API for your application.

thiloplanz avatar Sep 08 '22 08:09 thiloplanz

In case of network issue, the ping might fail even though the pod is actually alive processing messages. However the infra (like Kubernetes) knows if the pod is alive or not because it's in charge of its lifecycle. Basically we rely on the built-in logic of the infrastructure to handle things like cluster split, etc.

ghostdogpr avatar Sep 08 '22 08:09 ghostdogpr

Hmm. I guess that can cut both ways. If the ping fails because of network issues, payload messages might fail for the same reason, even though Kubernetes knows that the pod is alive. In that scenario the ping healthcheck is closer to "proof in the pudding". 🤔

thiloplanz avatar Sep 08 '22 09:09 thiloplanz

We don't want to rebalance as we're not sure the pod is gone. Otherwise you might end up with the same shard on 2 different pods.

ghostdogpr avatar Sep 08 '22 09:09 ghostdogpr

Hello! I'm wondering how to test it during development, considering that localstack's EKS module is available in the Pro version only.

grouzen avatar Nov 22 '23 14:11 grouzen

Hello! I'm wondering how to test it during development, considering that localstack's EKS module is available in the Pro version only.

I don't have a great solution for that, we tested the k8s one in a real environment...

ghostdogpr avatar Nov 23 '23 00:11 ghostdogpr