aws-lambda-stream
aws-lambda-stream copied to clipboard
Seeding a new subsystem
How would you seed a new subsystem with historical data ?
Example usecase:
Subsys Xhas an egress-esg that emits external events and store them to its event lake- we create
Subsys Yand a routing rule to send external events sent by other subsytems to its event bus. - but Subsys Y also needs historical data
Possible solution:
a script similar to the aws-lambda-stream-cli that reads from Subsys X's event lake and sends the events to Subsys Y's event bus. It could work but I fear that it would send duplicates to Subsys Y's event lake.
Has anyone faced this problem before?
I think the most straight forward solution is to replay events of interest to X's egress gateway which would make it publish external events that will seed the Y subsystem
That means you will get a lot of duplicated external events in both X and Y event lakes. Or should we not store external events to the event lakes ?
No. I have eventPattern: { source: ['custom'] } rule for delivery stream that pushes events into event lake. That means that only internal events get pushed into the lake. But honestly even if they would get saved into the lake, I don't see a problem here (of course if it's not terrabytes of events you're trying to replay)
Or should we not store external events to the event lakes ?
Haven't noticed the second question at first. I don't have a definite answer. Right now in John Gilbert's s3 template anything-but-fault is saved. So it means that external events are saved too. In my lakes I have source: ['custom'], and I derived it from some past version of the template so it was like that at some point in the past. But logically, external events can't be replayed to anything within a subsystem so it seems there's little value of storing them in S3 lake, maybe just for statistcs sake.
Maybe John himself would want to say something on this topic.
Thanks for you insights.
In case you have multiple downstream subsystems, do you still replay events in the egress? We usually have an idempotency table in ingresses that expires records after 30 days to avoid sending duplicates. Should we keep those records forever to be able to replay as much as we want ?