replicator
replicator copied to clipboard
Checkpoint store in the target cluster
I've encountered jumps back in time when reading a stream from a replica Event Store, where in the original Event Store the timestamps embedded in my data always increased. I wonder if it's a wrong configuration on my part, or will this always happen whenever I restart the replicator? My replica Event Store was built from an older backup of the /var/lib/eventstore
directory where all the chunks are, and then feeding it only by the Replicator.
what do you mean "jumped back in time" ? If you don't do any transformation no data is changed ( except for the Event timestamp, which is system metadata, and should not be use for any application purpose) Replicatoer uses a chekpoint from the source store in roder to know where it is . And as you know, timestamps , and time is not reliable, servers do go back & forward in time when they synchronize their clock wiht a NTP server
Thanks for quick reply. By the way the replicator is quite solid, especially I like how easy it is to setup based on your docker-compose example. I probably did something wrong. My transformation function is: function transform(original) {return original;}
. I use timestamp from our metaData inside the event (copied by the replicator), not the system metadata Event timestamp which is freshly stamped on every replicated event. By "jumping back in time" I mean jumping back by 8 days at a certain point when reading our custom ALL stream from the Replica Event Store which is filled by the following projection, which is actively running in both Event Stores:
fromAll()
.when({
$any: function(stream, event, metaData) {
linkTo('ALL', event, metaData);
}
});
Maybe I should exclude such streams from replication, or disable such projections in the Replica Event Store?
that seams strange, what are you using that replica for? you know you have the $all stream you can read / subscribe from and most clients we maintain support that , if not now, then in the very short future ?
I think the reason and the difference is that $all
additionally contains tons of the system events like $statsCollected
, $statistics-report
, $result
, $state
etc, while we just wanted a stream to read all our events from, regardless to which stream they were originally published.
Our replica Event Store is used for running heavier projections for monitoring and debugging purposes, which could otherwise degrade the main Event Store performance.
I guess , you're on V5.x , if you upgrade to v21.10.x then those are not in the database itself anymore ( stats related streams, $result & $state are still in ) ( and you get server side filtered read as a bonus with the gRPC based clients: https://developers.eventstore.com/clients/grpc/subscriptions.html#filter-options)
In addition, 20+ version allows applying server-side filters on subscriptions to $all
, which is a much better way to get the same outcome, as it won't require running a replica server.
It's also possible to use a server-side projection to link all the events to a certain stream, so it will become a pseudo-$all. But then again, the only use case I am aware of for this is persistent subscriptions. And, we will soon release the client that support persistent subscriptions to $all, as the server supports it since 21.6 (I believe).
Using the target server to store checkpoints essentially doubles the write load to the target, so I always had doubts about usefulness of this feature...
Using the target server to store checkpoints essentially doubles the write load to the target, so I always had doubts about usefulness of this feature...
I think you are right and storing the checkpoint in a file configurable via replicator.yml
is optimal.
The use case that I'm missing is to be able to bootstrap the target by copying the /var/lib/eventstore
directory from the source Event Store, to give a head start to the whole replication process. Currently I think only empty target Event Store is supported, until the checkpoint file has been written by the replicator for the first time.
I'm using Event Store 5.0.9.0 and Replicator 0.4.1
By unsupported I mean exactly these jumps backwards in time by multiple days. Example: I wrote a Python script which prints the first event of a stream and then scans the rest of it for temporal inconsistencies:
Fetching events from http://localhost:2113/streams/Assignment-Stream/0/forward/4000
2021-11-16T14:20:56.934Z 498613ec-2496-4d4f-9090-2aa1733a0ca1 EntityAssigned Assignment-Stream
.................................3339: timestamp jumped back in time by -P18DT6H4M41.791S within Assignment-Stream:
2021-12-04T20:25:38.725Z 7b82011c-87b3-4144-bf32-68c8de2d7bed EntityUnassigned Assignment-Stream
2021-11-16T14:20:56.934Z 498613ec-2496-4d4f-9090-2aa1733a0ca1 EntityAssigned Assignment-Stream
.........
2021-12-04 is the date of the last backup of the /var/lib/eventstore
directory from the source ES (copied over to target ES before starting the Replicator, to minimize the initial gap)
2021-11-16 is the date of the first event in the source ES, which after starting the Replicator now appears twice in my target ES, even with the same eventId which I thought was supposed to be unique.
As a workaround I will simply start with an empty target Event Store. Clearly my use case could only work in the absence of any kind of transformation or filtering during replication - which I don't need at the moment.