valkey
valkey copied to clipboard
For MEETs, save the extensions support flag immediately during MEET processing
For backwards compatibility reasons, a node will wait until it receives a cluster message with the extensions flag before sending its own extensions. This leads to a delay in shard ID propagation that can corrupt nodes.conf with inaccurate shard IDs if a node is restarted before this can stabilize.
This fixes much of that delay by immediately triggering the extensions-supported flag during the MEET processing and attaching the node to the link, allowing the PONG reply to contain OSS extensions.
Partially fixes #774
(force pushed with DCO)
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 70.33%. Comparing base (
9948f07) to head (6699506). Report is 127 commits behind head on unstable.
Additional details and impacted files
@@ Coverage Diff @@
## unstable #778 +/- ##
============================================
+ Coverage 70.23% 70.33% +0.09%
============================================
Files 112 112
Lines 60602 61278 +676
============================================
+ Hits 42566 43099 +533
- Misses 18036 18179 +143
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/cluster_legacy.c | 85.84% <100.00%> (+0.05%) |
:arrow_up: |
Sorry for dropping the ball on this one, @bentotten. Let me find some time next and get back to this thread.
any update on this?
Thanks for your patience, @bentotten! I think this change makes sense overall. I think we should also consider updating the extension_capable flag right after completing the handshake as below:
https://github.com/valkey-io/valkey/blob/089048d36485bf0d0d202221a7497aec0563bcdd/src/cluster_legacy.c#L3194
otherwise, I think we will have to wait for the next PONG message to fix the flag at:
https://github.com/valkey-io/valkey/blob/089048d36485bf0d0d202221a7497aec0563bcdd/src/cluster_legacy.c#L3070
it looks like adding the name here during the MEET causes
senderto be valid during the next packet received, which hits this logic and frees the node/link before the handshake flag can be removed
Yeah, you are right. Setting the name during MEET alters the handshake sequence. I agree we should not make this change.
Along the same line, I'm now wondering whether we should connect the link and node during the MEET phase (via setClusterNodeToInboundClusterLink). Currently, we establish these "links" (between link and node) after the handshake is completed. Based on the existing implementation, I don't anticipate any issues with moving this logic earlier into the MEET phase. However, I would be a lot more comfortable if we could find a way to instruct clusterSendPing to explicitly set this extension-capable flag without relying on link->node.
Based on some offline discussion, we're going to merge this to see if it helps with another issue and I will create an issue with Ping's comment to followup with.