electric icon indicating copy to clipboard operation
electric copied to clipboard

fix(client): Make the throttle skip a snapshot if one is already running

Open davidmartos96 opened this issue 9 months ago • 6 comments

This is a behavior we've seen when creating the test for the last PR (#1251). If you add a log to the mock snapshot function, you will see that when stopping, the process waits for a large amount of snapshots to finish before stopping completely. This is because when polling, they are added to a queue due to the snapshot mutex.

Is this intentional? Or should the throttle just skip the call if the lock is taken?

cc @kevin-dp @msfstef

davidmartos96 avatar May 16 '24 08:05 davidmartos96

This seems unintentional and i'm certainly in favor of skipping the periodic snapshot if a snapshot is already in progress. cc @balegas

kevin-dp avatar May 16 '24 09:05 kevin-dp

It sounds good to skip... in the correct conditions :)...

but I think this code might miss changes: by the time you check the lock is taken, the mutex area is already beyond the change-capturing phase, so you could miss the last new changes that were produced that triggered the snapshot you're evaluating.

Did I miss something? I haven't looked deeply.

balegas avatar May 16 '24 10:05 balegas

Looking at this again. If polling misses a tick because of a race, maybe it's okay, but do you know why you're observing a large amount of snapshots being queued? Is the polling time not configured to be large enough? or is the apply function taking the mutex too agressively?

balegas avatar May 16 '24 12:05 balegas

I agree with @balegas the current change makes it possible for explicit calls to _throttledSnapshot to not result in a snapshot - I think it makes sense to add this check for whether the lock is busy in the polling interval alone since missing a polling tick is the same as just having a slightly longer poll interval which is arbitrary either way.

When _throttledSnapshot is called explicitly, in the start method and as a result of _handleClientOutboundStarted, it should be guaranteed that it is queued IMO

msfstef avatar May 16 '24 13:05 msfstef

@balegas The large amount of snapshots is mostly because of how some tests are configured. The polling interval is 200ms and some tests fake a snapshot taking longer than usual, 500ms-1000ms. So the polling schedules a few snapshots, which need to be resolved before the process stops. I'm not sure if it could end up happening in a real scenario, but we wanted to open the discussion.

Since there are places that want to ensure that a snapshot task gets queued, there could be an optional parameter when running the snapshot. The polling for instance doesn't really care about the skip.

davidmartos96 avatar May 16 '24 13:05 davidmartos96

thanks for clarifying. In the case of a snpashot taking longer than the polling period they would get queued, but they wouldn't run after calling stop because the throttled function is cancelled. Is the behaviour really that they are executing before returning?

balegas avatar May 16 '24 13:05 balegas

Another way to think about this issue would be to always skip the throttled snapshot if the mutex is taken and flag the process with a boolean property if someone requested a snapshot while one was running. Then, before the mutex is released, run the snapshot again if needed. This would work as long as the caller of the throttled snapshot doesn't need the output result, which looking at the source code, no one does. So the worst case scenario would be waiting an extra snapshot instead of an arbitrary queue from the mutex.

davidmartos96 avatar May 20 '24 14:05 davidmartos96

yeah, that makes sense for me too. The polling a bit of a corner case, so I prefer to have simpler code even if we lose a tick in some situations (against my original comment).

balegas avatar May 20 '24 16:05 balegas

👋 we've been working the last month on a rebuild of the Electric server over at a temporary repo https://github.com/electric-sql/electric-next/

You can read more about why we made the decision at https://next.electric-sql.com/about

We're really excited about all the new possibilities the new server brings and we hope you'll check it out soon and give us your feedback.

We're now moving the temporary repo back here. As part of that migration we're closing all the old issues and PRs. We really appreciate you taking the time to investigate and create this improvement!

KyleAMathews avatar Aug 06 '24 13:08 KyleAMathews