eventuous icon indicating copy to clipboard operation
eventuous copied to clipboard

[Serverless] Case: monolithic deployment

Open alexeyzimarev opened this issue 2 years ago • 7 comments

How to make subscriptions work in a serverless world?

Simplest scenario: monolithic deployment

The whole application represents a single bounded context, where commands and queries are combined into a single deployment unit. Such an application would include all the necessary components:

  • Command services
  • Command APIs
  • Query services
  • Read model subscriptions
  • Gateways

There are two ways to make it work:

  1. Just do nothing. Expect the function/container timeout to be enough for all the subscriptions to handle newly produced events. It would work in many cases, but there's no resilience against external infrastructure failures for event handler targets.
  2. Ensure the command is handled in an asynchronous unit of work. The command service would do its thing, as usual; there's no change required. Subscriptions would expose an internal HTTP API to report the last processed event position. A new function is needed to poll the API from inside the service, simulating traffic until all the subscriptions get beyond the position of the last produced event. It will prevent the function from being shut down (in theory).
  3. Ensure the command is handled in a synchronous unit of work. The command API will block until all subscriptions handle all the produced events in the application. It requires some method of internal asynchronous communication. Subscriptions will produce internal messages with the latest processed event position. The command API can observe these messages, so it can ensure that all the new events are handled. After that's confirmed, the command API returns the response as usual.

The last point also applies to non-serverless scenarios, where we must ensure that read models are updated. It essentially implements the Wait strategy from the docs.

alexeyzimarev avatar Nov 13 '22 10:11 alexeyzimarev

The drawback here is that the application can't be freely scaled out. When it does, it will have the same subscriptions processing events in parallel, which might create undesired side effects.

So, the issue is to ensure that only one subscription instance runs at a given time.

alexeyzimarev avatar Nov 13 '22 10:11 alexeyzimarev

I noticed a potential issue if the solution is a monolith.

If you add a Gateway using RabbitMq as your producer, the subscriptionId will translate into the Queue. But then you can't add a subscription to listen to that gateway, since it will throw the error trying to add a subscription with the same Id.

bild

Totteperera avatar Dec 04 '22 20:12 Totteperera

The gateway will listen to what? If it's a gateway between ESDB and RabbitMQ, you don't need to use the queue name as the subscription id.

But in any case, I need to check the queue name convention; it's been a while. It should be possible to override the queue name

alexeyzimarev avatar Dec 04 '22 21:12 alexeyzimarev

I understand. What I meant was: During a Monolith setup, but still decouple the aggregates.

Create a gateway bild

Create a subscription to listen to that gateway bild

InvalidOperationException will be thrown from AddSubscriptionBuilder in NamedRegistrationExtensions bild

So yes, a override to set queue name would be great

Totteperera avatar Dec 06 '22 12:12 Totteperera

I understand. I will ensure that it is possible to override the exchange name for the gateway producer and have a more fine-grained configuration for the subscription. You can already specify the exchange, but it should be possible to override the queue name. Conventions are good, but there must be a way to override them. Good point, thanks.

alexeyzimarev avatar Dec 06 '22 15:12 alexeyzimarev

I did some tests with Cloud Run and it's a mess.

  • The container shuts down as soon as it finished processing a request. Sometimes, the subscription just gets enough time to start.
  • I tried a self-call, and it causes a situation when the service is being shut down but it still waits for the self-call to complete, so Could Run starts another instance with one more subscription
  • Setting the min instance count doesn't guarantee it's the same instance
  • Setting the max instance count to one doesn't work with self-call as Cloud Run spins up new instances wildly, trying to accommodate incoming calls when the existing instance seems "stuck" (fails to shut down)

Still need to try other services.

alexeyzimarev avatar May 02 '23 09:05 alexeyzimarev

Looking at Porto.actor cluster providers, it seems to be possible to do the elections on services like Azure Container Apps or Amazon Fargate (both EKS and ESC) as there's a way to get the list of members and allow them to communicate. It's still pseudo-serverless as these are just container orchestrators. In sole serverless environments like Lambda of (***)Functions it isn't possible, and it's the same in Google Cloud Run as they also hide the implementation although it supports the KNative API.

My conclusion is that using the Connector in a predictable environment (Kubernetes, or a wrapper around it) to run a generic Connector workload, which uses serverless workloads to do the actual job is the only way to do.

alexeyzimarev avatar Jun 06 '23 21:06 alexeyzimarev