tile38 icon indicating copy to clipboard operation
tile38 copied to clipboard

Server Hooks

Open mudit3774 opened this issue 4 years ago • 6 comments

Is your feature request related to a problem? Please describe.

These are the problems that we are trying to solve :

  1. We are trying to run Tile38 as a sidecar to an application that has high reads and low writes. We want to sync data between different instances of Tile 38. In order to do this, whenever a write command is issued to tile38, on success, we send a message to a kafka topic with the command details. Each app instance additionally has a consumer with receives this message and applies the changes to the local Tile38 (if it is not published by the same instance). This approach would improve latency and availability of running tile38

  2. We are running Tile38 in stateless containers so we are syncing the AOF produced by Tile38 nodes to S3 periodically, on startup, before loadingAOF, we download the latest file. We also replay messages from kafka for the last 'x' minutes to ensure that we don't miss any updates if we are using (1)

We want to solve these problems using Tile38 hooks. The hooks would just trigger this logic and this logic will reside outside the main tile38 codebase

Describe the solution you'd like We would want to add support for the following hooks in Tile38 :

  1. Pre - AOF load hook
  2. Pre-server server start hook (before health check passes)
  3. Post-command execution hooks for write commands (in handInputCommand function)

We did a small POC to validate this. Please look at comments in the diff

Please note that these hooks would just be called with some context (only is enabled in config) and Tile38 will provide a no-op implementation for these hooks. The user can bind their custom module to this.

Then user can inject certain functionalities to tile38 through hooks without the overhead of maintaining the fork.

We can use hooks to achieve some of the things mentioned in this ticket also.

Describe alternatives you've considered

For problem (1) we have considered moving the logic to the application interaction with tile 38 / library connecting to tile38 and tile38. This is the comparison :

Criteria App (Using Tile38) Redis Client Tile38 - Server
Development Violates DRY. Repeated work. (HIGH) Should be implemented for all redis clients and languages, if required. (MEDIUM) Can be implemented centratally and distributed. (LOW)
Maintainability Will have to be implemented by application and adoption would be low. Will have to maintain multiple clients in multiple languages. Will have to fork Tile38 and maintain although changes will be isolated in clustered mode.
Complexity (Replay) Should implement replay once health check passes Once Tile38 health check passes, we will have to replay Health check can pass once AOF reload and replay can is done.
Complexity (Consumer Group Management) Would be hard to clean unused consumer group Easily clean unused consumer groups Easily clean unused consumer groups
Distribution Left to the application Sidecar + Custom Client Clustered Sidecar

For problem (2) we had considered writing a separate script but the drawback was alerting monitoring etc. needs to be done for this separate process.

Additional context

  1. We are using the Kafka approach for HA vs sentinal because as a company we have expertise in maintaining a stable Kafka cluster.

  2. We are uploading AOF instead of using persistent disk because of the limitation of the Kubernetes cluster that we are using internally and have expertise in.

@tidwall / others, please let us know what you think about this.

mudit3774 avatar Dec 21 '19 15:12 mudit3774

In your case, I can see the value of having event hooks during startup.

But I'm wondering if it's too specific to add to the master branch. Tile38 currently has replication, which effectively serves the same purpose; one leader (writer) propagates all changes its multiple followers (readers).

The Kafka requirement may make replication a no-go, but have you considered using the built-in replication?

I tried to take a peek at your POC, but it's currently 404.

tidwall avatar Dec 27 '19 14:12 tidwall

Thanks for the reply @tidwall. Apologies for the delay in response.

Due to some reasons, we had to remove the fork but I have raised another PR with the implementation.

We are just proposing a simple framework for pre-cmd and pre-start hooks so that people can hook in their logic without maintaining a fork of Tile38.

Some sample requirements are Kafka based replication (pre-cmd) and downloading the AOF file from S3 before start-up (pre-start). Please note that the PR does not have these sample implementation, it only has a base no-op implementation. We don't intend to raise that against master unless you think it would be useful.

We did consider using the in-built replication but decided against it. These are the reasons for it :

  1. Operational Complexity: We wanted high-availability and hence auto-failover in-case of failures. This would have required us to deploy both tile-38 and sentinal in multiple availability zones. It would also have required us to maintain separate alerting and monitoring for sentinal apart from tile 38.

  2. High Availability: As discussed in this issue, there is no built-in support for HA and we did not have expertise in maintaining sentinal, hence we opted for the Kafka based solution since we had operational expertise in Kafka.

Let me know if this makes sense. I will raise the same PR with more tests and documentation against master for review.

mudit3774 avatar Jan 03 '20 08:01 mudit3774

Hey @tidwall please have a look at the PR whenever you have time. Let me know if any further clarification is needed.

mudit3774 avatar Jan 13 '20 15:01 mudit3774

@mudit3774 Sorry for the wait. After some thought, I think it's best to hold off on adding these features to the main branch. They seem like narrow features that are specific to your use case. I'm not sure if it will benefit other Tile38 users.

tidwall avatar Jan 15 '20 19:01 tidwall

@tidwall any chance of re-visiting this? We are seeing the complexity of us trying to set-up a Tile38 / Sentinel cluster that spans multiple data centers. The hooks suggested and their approach to having (n) masters that are kept in sync via a side car are interesting. Pre-command / Post-command hooks may have other interesting uses as well.

rave-eserating avatar Jun 23 '22 19:06 rave-eserating

This would benefit many Tile38 users interested in running tile38 in production at scale. +1 here.

rsvancara avatar Dec 23 '23 14:12 rsvancara