dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

feat(transaction): Use single hop in squashing when possible

Open dranikpg opened this issue 1 year ago • 6 comments

Idea:

If we determine that squashed commands form only a single batch, we can use ScheduleSingleHop() for that batch, which:

  • Reduces hops from 3 to 2 (no unlock)
  • Enables "quick" runs for single shard cases

Without contended keys it gives no improvements 😞, but with a contended it's potentially up to two times faster 🙂

dranikpg avatar Jan 04 '24 18:01 dranikpg

There are two options to running a single hop: either keep the base tx multi or not. With single-shard EVAL we decided to reduce it to a regular transaction, which could have even been done much easier by just removing the multi flag after proper initialization.

With MULTI/EXEC we have to handle replication, mainly sending the closing EXEC journal message with shard count, etc. If we decide to stick to non-multi single hops, we have to emulate it ourselves.

Instead, I changed the logic, so that if Schedule/ScheduleSingleHop will be called on a multi that hasn't scheduled yet, it'll do so according to the rules of Schedule or SSH. What is more, if we call SSH, we'll set the concluding bit, so UnlockMulti's job will be done when concluding execution, reducing the number of hops from 3 to 2 for multi-shard cases

dranikpg avatar Jan 07 '24 12:01 dranikpg

Still needs some polishment, I also need to check the replica side more in detail

dranikpg avatar Jan 08 '24 11:01 dranikpg

it's quite large change with lots of conditions. Do you think it is possible to split it into smaller ones?

I can split the core logic and then it's uses (eval, exec and squasher), but there is no way to test each part without the full changes. Besides they're all in separate places, so it's not the depth of changes, but the range, each part is independent

I do not see new tests besides that small unit test. We know for sure that we have a memory leak during the replication. If this PR fixes it, can you add a (py)test reproducing the slave leak and showing it's been fixed here?

"memory leak"? Our current "fix" disables multi mode, so we just don't send the replication report. I will add a test

dranikpg avatar Feb 03 '24 07:02 dranikpg

I wanted to simplify journaling so that stub transactions report directly to the main transaction, this way we don't need both ReportWritesSquashedMulti and FIX_ConcludeJournalExec at all, but with a single hop, we can't know the full journal number when the first shard concludes 😞

dranikpg avatar Mar 08 '24 10:03 dranikpg

consider rebasing

romange avatar Apr 03 '24 20:04 romange

Will do. This PR will become much easier because we don't need to use ScheduleSingleHop as it's no longer special

dranikpg avatar Apr 03 '24 20:04 dranikpg