valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Cluster: Assign default human-readable node name when starting server

Open zhijun42 opened this issue 2 months ago • 10 comments

By default, servers in a cluster don't have a human-readable nodename unless explicitly set by user. This can be annoying when reading the server logs during local development, since we would need to manually lookup the unique node ID (40 chars long) to figure out its port number. Most logging statements in the cluster are in the form of serverLog(LL_NOTICE, "Something.. %.40s (%s) something..", node->name, node->human_nodename); , and thus this would always result in an empty () like the following:

2832:M 26 Oct 2025 18:00:14.033 * Node dd4f43d9e4a1a2875a514733c08d4870c21db682 () is now a replica of node 1b5d2a4865c04aaa1ccf06db66f22a043411ded7 () in shard d3fc557d12ab3c5a8850c7b0c9afb252b473cad1
2832:M 26 Oct 2025 18:00:14.033 * Clear FAIL state for node dd4f43d9e4a1a2875a514733c08d4870c21db682 (): replica is reachable again.
2832:M 26 Oct 2025 19:21:07.191 * NODE 524978a9924b5841c4c93f83581d202cfbce1443 () possibly failing.
2832:M 26 Oct 2025 19:21:07.838 * FAIL message received from 747e0f97a58cd5a7ac4c2130417fc47a807802ed () about 524978a9924b5841c4c93f83581d202cfbce1443 ()
2832:M 26 Oct 2025 19:22:40.434 * Clear FAIL state for node 524978a9924b5841c4c93f83581d202cfbce1443 (): replica is reachable again.
2832:M 26 Oct 2025 19:46:43.539 * FAIL message received from 747e0f97a58cd5a7ac4c2130417fc47a807802ed () about 1b5d2a4865c04aaa1ccf06db66f22a043411ded7 ()
2832:M 26 Oct 2025 19:46:43.539 # Cluster state changed: fail

Assigning a default human-readable nodename solves the annoyance:

7614:S 26 Oct 2025 19:52:21.642 * NODE 03b3811da7b33451f6e4988295b0158f410d2c00 (127.0.0.1:7001) possibly failing.
7614:S 26 Oct 2025 19:52:22.550 * Marking node 03b3811da7b33451f6e4988295b0158f410d2c00 (127.0.0.1:7001) as failing (quorum reached).
7614:S 26 Oct 2025 19:54:15.915 * NODE b2f0fc93632d12c06039335b025365d484ff9049 (127.0.0.1:7002) possibly failing.
7614:S 26 Oct 2025 19:54:16.117 * Marking node b2f0fc93632d12c06039335b025365d484ff9049 (127.0.0.1:7002) as failing (quorum reached).
7614:S 26 Oct 2025 19:54:16.117 # Cluster state changed: fail
7614:S 26 Oct 2025 19:54:16.117 # Cluster is currently down: At least one hash slot is not served by any available node. Please check the 'cluster-require-full-coverage' configuration.
7614:S 26 Oct 2025 19:54:16.574 * Reconfiguring node 524978a9924b5841c4c93f83581d202cfbce1443 (127.0.0.1:7005) as primary for shard 8488c1c467066206db60b136e9a38e1f583eb1f8

For now I use the node's IP and port combination as the nodename. If users have personal preference on the format, they can always update nodename via CONFIG SET cluster-announce-human-nodename MYNAME command or edit the config file.

If users prefer nodenames remain empty, they can run CONFIG SET cluster-announce-human-nodename "" to reset.

zhijun42 avatar Oct 26 '25 12:10 zhijun42

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 72.49%. Comparing base (d16788e) to head (bb07a67).

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2777      +/-   ##
============================================
+ Coverage     72.45%   72.49%   +0.04%     
============================================
  Files           128      128              
  Lines         70485    70498      +13     
============================================
+ Hits          51068    51111      +43     
+ Misses        19417    19387      -30     
Files with missing lines Coverage Δ
src/cluster_legacy.c 88.00% <100.00%> (+0.45%) :arrow_up:

... and 10 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Oct 27 '25 05:10 codecov[bot]

On this note, Docker's approach of generating random name for resource has always been interesting, easier to spot/remember.

https://github.com/moby/moby/blob/master/pkg/namesgenerator/names-generator.go

hpatro avatar Oct 28 '25 15:10 hpatro

@hpatro Thanks for reviewing!

I generally use the first/last few characters of node-id while searching and I've never seen it conflicting with other node's id.

Yeah the first/last few characters of node ID are unique enough. I assume your use case is perhaps like this: You have a node that did something interesting and/or you're interested in it, then you search through the logs using that node ID to see how this node interacted with others.

My use case is a bit different though. When developing locally (non Docker/K8s), I have a mini cluster running where each node is running in a terminal tab. I'd like to fully observe all behaviors of the cluster, so inside each terminal tab I'm reading through this node’s log stream and would like to instantly recognize peers it’s communicating with. A 40-hex node ID is cognitively heavy; 127.0.0.1:7002 is instant.

Note that I'm not searching for any particular node, I'm simply reading the entire log stream. And having logs like these would be very useful for me:

* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about 747e0f97a58cd5a7ac4c2130417fc47a807802ed (127.0.0.1:7004)
* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about b2f0fc93632d12c06039335b025365d484ff9049 (127.0.0.1:7002)
* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about 524978a9924b5841c4c93f83581d202cfbce1443 (127.0.0.1:7005)
* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about 03b3811da7b33451f6e4988295b0158f410d2c00 (127.0.0.1:7001)

Wouldn't you be able to set this externally on the engine by yourself?

Yeah we can always manually assign names to nodes, but I don't wanna bother doing this every time I start up a cluster. The default combination of IP + port proposed here is pretty recognizable for me.

On this note, Docker's approach of generating random name for resource has always been interesting, easier to spot/remember.

TIL! Browsed through the list and just found a lot of great guys in our history. We can probably do the same thing here in Valkey. My approach of using IP + port is functionally the same as using scientists' names - just helping developers associate a server node with some string they can immediately recognize. Since I can easily recognize IP + port, that is what I propose here.

The thing is Valkey cluster has most logs print out the node ID and nodename, but the nodename is always empty unless explicitly set, which is not great user experience for me.

zhijun42 avatar Oct 29 '25 00:10 zhijun42

Fair. However, I don't see a mechanism for a user to disable printing the human nodename i.e. one can't set to empty. Rather setting it to empty would lead to printing ip:port. Would this be considered a breaking change?

@zuiderkwast / @enjoy-binbin thoughts ?

hpatro avatar Nov 19 '25 00:11 hpatro

Fair. However, I don't see a mechanism for a user to disable printing the human nodename i.e. one can't set to empty. Rather setting it to empty would lead to printing ip:port.

Good catch! I just pushed some changes to allow resetting the human nodename by running CONFIG SET cluster-announce-human-nodename "". Also added a test for this.

Would this be considered a breaking change?

Human nodename is a relatively new feature introduced in June 2023 and it's mostly used in logging. The only usage not in logging that I'm aware of is in the nodes.conf file like this:

f130ef4be2a5a3bdad469bba0f82e8f27e5411f6 127.0.0.1:7004@17004,,tls-port=0,nodename=127.0.0.1:7004,shard-id=b5551cc30b41cbdbc78628e77d0d63e10543a1e7 slave 9a4b1f71ed90fd482db902949169e846e600ce30 0 1763554378000 4 connected
vars currentEpoch 5 lastVoteEpoch 5

but the human nodename won't show up there if it's empty string. So overall I think we're pretty safe here.

zhijun42 avatar Nov 19 '25 12:11 zhijun42

Fair. However, I don't see a mechanism for a user to disable printing the human nodename i.e. one can't set to empty. Rather setting it to empty would lead to printing ip:port. Would this be considered a breaking change?

@zuiderkwast / @enjoy-binbin thoughts ?

I don't think changes to the log messages are considered breaking changes. I think it's safe to change what's being logged.

However, if we store "127.0.0.1:7008" as a node's human node name internally, it also affects the output of commands like CLUSTER SHARDS and CLUSTER NODES where this would be appear as the human nodename, which it isn't really. Instead, I think we should just change what's being logged.

I do like to see this in the log as a fallback, instead of just and empty () as you would for any user that's not using the human nodename feature.

Can't we just change the logging, so wherever we log like ...

serverLog(LL_NOTICE, "Something.. %.40s (%s) something..", node->name, node->human_nodename);

... we change it to something like ...

serverLog(LL_NOTICE, "Something.. %.40s (%s) something..", node->name, humanReadableName(node));

... which would expand to either the configured human nodename or ip:port.

zuiderkwast avatar Nov 19 '25 14:11 zuiderkwast

@zuiderkwast

it also affects the output of commands like CLUSTER SHARDS and CLUSTER NODES

Actually it won't. CLUSTER SHARDS shows the node's hostname but not the human nodename. CLUSTER NODES doesn't show human nodename either, according to function clusterGenNodeDescription.

Can't we just change the logging, which would expand to either the configured human nodename or ip:port.

This is a good idea, but to do so we can't support the feature "Users may not like ip:port and want to reset human nodes to empty string" (although I'm not sure if this feature is actually desirable). From node A's point of view, node B's human nodename is empty string, and node A couldn't tell whether that's because of user not using the human nodename feature, or user explicitly setting human nodename to empty string.

In my current implementation, node A knows node B's human nodename is "127.0.0.1:7008", and if node A later sees this name become empty string, it knows the user sets so.

zhijun42 avatar Nov 19 '25 15:11 zhijun42

Actually it won't. CLUSTER SHARDS shows the node's hostname but not the human nodename.

Oh, right. But it's not unlikely that we'll add human nodename to CLUSTER SHARDS in the future. We added shard-id to CLUSTER SHARDS just recently in #2568.

Can't we just change the logging, which would expand to either the configured human nodename or ip:port.

This is a good idea, but to do so we can't support the feature "Users may not like ip:port and want to reset human nodes to empty string" (although I'm not sure if this feature is actually desirable).

To me, it is not desirable. If we do this a conditional in the logging, then we'll not say that we automatically assign a human nodename. We just log something useful instead of the human nodename when it's missing.

zuiderkwast avatar Nov 19 '25 15:11 zuiderkwast

@zuiderkwast Sounds good! I just added the helper function humanNodename and removed all changes previously used for feature "Users reset human nodename to empty".

zhijun42 avatar Nov 20 '25 05:11 zhijun42

@zuiderkwast Friendly ping ☺️

zhijun42 avatar Nov 26 '25 14:11 zhijun42

@hwware nodename is your feature, do you want to comment or review this one?

enjoy-binbin avatar Nov 28 '25 02:11 enjoy-binbin

@enjoy-binbin

ip:port could be the nodename, but they are not the same thing

My intent is not to say “ip:port is the same as nodename”, but: “if the user didn’t define any human label, then ip:port is a reasonable default label to help the user easily recognize each node.”

nodename was added long time ago, now i guess people are already familiar with this empty ()

From git I see this is a relatively new feature introduced in June 2023. Developers being familiar with empty nodename is actually an indication that developers probably don't know about this feature and simply don't use it.

My thinking is:

  • The log format stays the same, just the content inside () becomes actually helpful.
  • There’s no behavior change in cluster state, gossip, failover, etc. This is for logging only.
  • For developers who never cared about the empty (), this is just a small quality-of-life improvement: suddenly the logs tell them which node is which, without extra node ID lookups.
  • For developers who rely on custom names they prefer, nothing changes – their explicit nodename overrides the default.

I think we can add more info to improve the logging

I'd always appreciate better logging. After some heavy debugging in Valkey cluster, I realize it greatly helps to be able to quickly recognize which node sends what packet to which node, so I propose this PR and believe it will also help other developers a lot.

i like the change in the TCL tests, we set the nodename for each server so that we can lookup it. So for the real setup, people should do the same thing.

Really glad that you enjoy the TCL change! This is actually the beauty of having useful default values. In tests we can auto assign human nodenames like R0, R1, R2 etc, but in local/production we can't simply do that.

The human nodename feature is powerful, but it can only be useful when the nodename is non empty. If some developers are already assigning nodenames they prefer at cluster start up every time, that's great, and my PR won't have any impact on them. But for people who don't explicitly assign nodenames, I believe ip:port is a reasonable fallback.

Overall I think it gives a big readability win at very low risk.

zhijun42 avatar Nov 28 '25 03:11 zhijun42

So the main purpose is for the local env debug? It is not a real request for the production env?

@enjoy-binbin It can be very useful when reading logs from production env where the user didn't set human nodename. As noted, many users don't use the human nodename feature.

I think we can add more info to improve the logging, but part of me don't think we should give the default value.

Yeah, after review changes, this PR only affects the logging. It doesn't assign a default nodename.

@zhijun42 I think you should update the title. "Assign default human-readable node name when starting server" is not correct anymore.

zuiderkwast avatar Nov 28 '25 10:11 zuiderkwast

@zuiderkwast Thanks for the reminder! The outdated content did cause much confusion.

Updated the title and description accordingly.

zhijun42 avatar Nov 28 '25 10:11 zhijun42

@zuiderkwast Should we merge this now that it's approved?

zhijun42 avatar Dec 05 '25 11:12 zhijun42

@zuiderkwast Should we merge this now that it's approved?

Yes, but first I want to give some time for the other reviewers @enjoy-binbin and @hpatro to comment. I will merge it after a week if there are no comments.

zuiderkwast avatar Dec 05 '25 11:12 zuiderkwast