HosterCore icon indicating copy to clipboard operation
HosterCore copied to clipboard

Edge Case: Move replication script generation logic into the `scheduler` service (instead of the `scheduler` client)

Open yaroslav-gwit opened this issue 3 months ago • 0 comments

This is an edge-case that doesn't happen very often, and is relatively easy to work around: image

Because the logic of generating snapshots to be sent within a replication session is being executed on the scheduler client, it gets outdated by the next snapshot job in the scheduler service' queue if the resource in question happens to be the same (VM/Jail with a same name and two separate scheduled jobs, added within a very short period of time from one another - the new snapshot gets taken and the old one is being deleted by the snapshot job, which changes the original snapshot list). The replication jobs fails in this case, because it can't physically find a snapshot that you told it to send. This is a rare scheduling/locking issue that I didn't account for in the beginning.

The fix would be to move "generate snapshots to replicate" logic from the client to service. This way the service will be able to use a lock/mutex in order to prevent the unwanted data modification.

For now, the work-around is to simply re-run the replication once again (or wait for the next cronjob to kick in and do it for you). The bug is mostly "innocent" (and quite rare), as it can't damage any data on the receiving end.

yaroslav-gwit avatar Apr 17 '24 20:04 yaroslav-gwit