restate icon indicating copy to clipboard operation
restate copied to clipboard

Trims should remove segments synchronously

Open jackkleeman opened this issue 9 months ago • 2 comments

Currently when we snapshot with the trim flag, this will not necessarily be reflected in segments being removed from logs config synchronously. The segment removal is handled by the bifrost watchdog asynchronously, and its not batched across logs, so it can lead to quite a lot of thrashing on the logs config, so the segment may not be removed for some time. A few areas I think there could be improvement:

  1. Is it strictly necessary to even perform the loglet trim, which necessitates write quorum, if the trim point is past the segment. Can we just do the metadata operation to remove the loglet and let the loglet read this and clear its own state in time?
  2. The logs mutation across several trim operations could be batched so that we do less metadata operations and they can get done quicker due to less thrashing. The read-mutate-write loops can be quite painful when there's too many of them happening at once.
  3. Could we consider segment removal being a synchronous part of trim and not return to the caller til its done? From the perspective of a controller in cloud, we need to keep submitting snapshot / trim requests until we observe that a loglet relying on a node no longer exists. So, just returning saying the log has trimmed past the tail of the loglet isn't sufficient, as the loglet is still present.

jackkleeman avatar Apr 08 '25 08:04 jackkleeman

Is it strictly necessary to even perform the loglet trim, which necessitates write quorum, if the trim point is past the segment. Can we just do the metadata operation to remove the loglet and let the loglet read this and clear its own state in time?

It's not strictly necessary, but we don't have background purging to delete the data. Support for purging on log-servers should make this possible.

The logs mutation across several trim operations could be batched so that we do less metadata operations and they can get done quicker due to less thrashing. The read-mutate-write loops can be quite painful when there's too many of them happening at once.

That's a good point. I think we can do better here. Agreed.

Could we consider segment removal being a synchronous part of trim and not return to the caller til its done? From the perspective of a controller in cloud, we need to keep submitting snapshot / trim requests until we observe that a loglet relying on a node no longer exists. So, just returning saying the log has trimmed past the tail of the loglet isn't sufficient, as the loglet is still present.

That goes against batching I'm afraid. My thinking was to let the watchdog batch pending operations into one. Perhaps we can fake the behavior by waiting at the API layer until those segments are removed, but I'm not sure if it's worth the trouble.

AhmedSoliman avatar Apr 08 '25 08:04 AhmedSoliman

That goes against batching I'm afraid

Not necessarily! In a world of concurrent snapshot/trim requests, you could still try to group them and handle the log updates all at once. Anyway, maybe the blocking of the trim operation isn't so important, as long as the segment removal is happening much more rapidly than it does today

jackkleeman avatar Apr 08 '25 09:04 jackkleeman