ballerina-library icon indicating copy to clipboard operation
ballerina-library copied to clipboard

ASB WebSubHub 2.0

Open shafreenAnfar opened this issue 2 years ago • 2 comments

Before I get into the details of 2.0 design, lets first look at 1.0 design and its short comings.

Following is the high-level architecture diagram for 1.0.

ASB WebSubHub 1 0

High level design is similar to event sourcing in Event Driven Design (EDA). There are four types of events.

  1. Topic registered event
  2. Topic deregistered event
  3. Subscribed event
  4. Unsubscribed event

First two events are stored in websub-topic topic whereas second two events are stored in websub-subscribers topic. Then these events are consolidated and put back to websub-consolidated-topics and websub-consolidated-topics by the consolidator. Hubs basically listen to these topics to get notified every time there is a change.

Following is the Hub state.

Hub State = Consolidated websub topics + Consolidated websub subscribers

When the hub restarts it notifies the consolidator asking for the latest consolidated messages.

Problems of the current design as follows.

  1. Events have to go the full round trip before hubs update its states (hub -> consolidator, consolidator -> hub)
  2. Consolidated messages has the potential to grow and become large messages
  3. Hub has to go through the entire consolidated messages to figure out what has changed
  4. If the consolidator restarts it has to get the states from the consolidated topics but topics messages could expire

Following is the proposed new design.

ASB WebSubHub 2 0

With new design all the events are sequenced into one series of events. All those events are stored in one topic as they are produced. These events are then again broadcasted to all the hubs.

When a hub restarts it can request for the latest state from the consolidator. Consolidator retrieves this from the stored database and send it to the hub. In order to do so consolidator stores the data for each event similar to version 1. This is an area we can improve as we move on.

When a hub (H1) restarts following is what could happen.

Hub State even series

When it restarts it gets the consolidated state from E6. But it stoped at E3. Therefore, it can ignore all the messages till E6 and start processing from that point onwards. Note that each message is sequenced hence this is possible.

When consolidator restarts it can be latest status from the database and get back to work. The new version has none of the problems version one had. Also, it is simpler and follows proper even sourcing pattern.

shafreenAnfar avatar Mar 27 '23 07:03 shafreenAnfar

Currently, the Azure WebSubHub system utilizes the revised topic architecture but has not yet implemented a NoSQL database-based setup for state management. As a result, the persisted state snapshot may be lost if no new state updates occur for an extended period. For instance, if the state snapshot is not updated within 30 days, the corresponding ASB topic will lose the state message.

To address this issue, we have two potential approaches:

  1. Migrate to a NoSQL Database-Based Setup: Transition the existing state management system to a NoSQL database to ensure reliability.
  2. Implement a Shutdown Hook for State Updates: Introduce a feature that updates the current state snapshot whenever the consolidator is terminated, using a shutdown-hook-like mechanism. [1]

Given the critical nature of this limitation and the time required to implement a NoSQL-based solution, we will proceed with option (2) as an immediate fix. The migration to a NoSQL database-based setup (option (1)) will be planned for a future update.

[1] - https://ballerina.io/learn/by-example/stop-handler/

@shafreenAnfar @janethavi FYI

ayeshLK avatar Jan 21 '25 09:01 ayeshLK

  1. Migrate to a NoSQL Database-Based Setup: Transition the existing state management system to a NoSQL database to ensure reliability.

@ayeshLK In the place of using a NoSQL database, shall we consider using a PersistentVolume [1][2] for this purpose ? In our opinion it will be more suitable method than a NoSQL DB since this is just about keeping a single JSON entity attached to a k8s deployment.

cc:- @chalitha1989

[1] https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/ [2] https://learn.microsoft.com/en-us/azure/aks/concepts-storage

udhanMti avatar Jan 29 '25 08:01 udhanMti