valkey icon indicating copy to clipboard operation
valkey copied to clipboard

For MEETs, save the extensions support flag immediately during MEET processing

Open bentotten opened this issue 1 year ago • 3 comments

For backwards compatibility reasons, a node will wait until it receives a cluster message with the extensions flag before sending its own extensions. This leads to a delay in shard ID propagation that can corrupt nodes.conf with inaccurate shard IDs if a node is restarted before this can stabilize.

This fixes much of that delay by immediately triggering the extensions-supported flag during the MEET processing and attaching the node to the link, allowing the PONG reply to contain OSS extensions.

Partially fixes #774

bentotten avatar Jul 12 '24 12:07 bentotten

(force pushed with DCO)

bentotten avatar Jul 12 '24 12:07 bentotten

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 70.33%. Comparing base (9948f07) to head (6699506). Report is 127 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #778      +/-   ##
============================================
+ Coverage     70.23%   70.33%   +0.09%     
============================================
  Files           112      112              
  Lines         60602    61278     +676     
============================================
+ Hits          42566    43099     +533     
- Misses        18036    18179     +143     
Files with missing lines Coverage Δ
src/cluster_legacy.c 85.84% <100.00%> (+0.05%) :arrow_up:

... and 28 files with indirect coverage changes

codecov[bot] avatar Jul 12 '24 13:07 codecov[bot]

Sorry for dropping the ball on this one, @bentotten. Let me find some time next and get back to this thread.

PingXie avatar Aug 12 '24 03:08 PingXie

any update on this?

bentotten avatar Aug 29 '24 22:08 bentotten

Thanks for your patience, @bentotten! I think this change makes sense overall. I think we should also consider updating the extension_capable flag right after completing the handshake as below:

https://github.com/valkey-io/valkey/blob/089048d36485bf0d0d202221a7497aec0563bcdd/src/cluster_legacy.c#L3194

otherwise, I think we will have to wait for the next PONG message to fix the flag at:

https://github.com/valkey-io/valkey/blob/089048d36485bf0d0d202221a7497aec0563bcdd/src/cluster_legacy.c#L3070

it looks like adding the name here during the MEET causes sender to be valid during the next packet received, which hits this logic and frees the node/link before the handshake flag can be removed

Yeah, you are right. Setting the name during MEET alters the handshake sequence. I agree we should not make this change.

Along the same line, I'm now wondering whether we should connect the link and node during the MEET phase (via setClusterNodeToInboundClusterLink). Currently, we establish these "links" (between link and node) after the handshake is completed. Based on the existing implementation, I don't anticipate any issues with moving this logic earlier into the MEET phase. However, I would be a lot more comfortable if we could find a way to instruct clusterSendPing to explicitly set this extension-capable flag without relying on link->node.

PingXie avatar Sep 02 '24 00:09 PingXie

Based on some offline discussion, we're going to merge this to see if it helps with another issue and I will create an issue with Ping's comment to followup with.

madolson avatar Sep 10 '24 03:09 madolson