dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

chore: lock free info command with replicaof v2

Open kostasrim opened this issue 3 months ago • 4 comments

Following #5774, this PR removes locking the replica_of_mu_ from info command and uses the thread locals && replica of v2 algorithm.

No more locking and blocking for INFO command during replicaof or takeover 🥳 🎉 🌮

kostasrim avatar Sep 28 '25 09:09 kostasrim

With all the changes around replication, I will follow up with a tidy PR 😄

kostasrim avatar Oct 02 '25 07:10 kostasrim

There is a problem with this design. You use shared_ptr to be able to access the object even if it's "released" in another thread. But ProtocolClient is designed to be destructed in its own thread. so suppose, ProtocolClient is created in thread A, and thread B holds the pointer to that object, and then A destroys the reference to ProtocolClient, the thread B will hold the only reference and essentially will call the destructor of ProtocolClient in the wrong thread.

romange avatar Oct 14 '25 23:10 romange

There is a problem with this design. You use shared_ptr to be able to access the object even if it's "released" in another thread. But ProtocolClient is designed to be destructed in its own thread. so suppose, ProtocolClient is created in thread A, and thread B holds the pointer to that object, and then A destroys the reference to ProtocolClient, the thread B will hold the only reference and essentially will call the destructor of ProtocolClient in the wrong thread.

I don't see any issue in destructing in a separate thread because by the time we call the destructor of ~ProtocolClient we have cleaned our resources. Specifically:

  1. shard_flows_ is empty
  2. sync_fb_ and acks_fb_ should have joined (I can add DCHECKS for this)
  3. base class ProtocolClient::sock_ is already Shutdown() && Closed() properly on the right thread so there is no IO involved or anything else

The destructor is "trivial" from that aspect (of course we had preemption because of the Await which I removed)

kostasrim avatar Oct 16 '25 15:10 kostasrim

I prefer the opposite:

  1. Submit a PR that does not change behavior: fixes bugs, refactor etc. Release it in 1.35
  2. Submit a clean PR that adds a lockless replication. Release it in 1.36

romange avatar Oct 16 '25 15:10 romange