valkey
valkey copied to clipboard
Add admin-port feature
Background:
Some cases, we can run valkey server on a container and multiply containers could be run in a physical machine simultaneously. Administrator wants to execute some special commands in the some servers of them, and general clients are not allowed to run these kinds of commands.
Thus, we want to set an admin port for administrator connection, and general clients can connect with server with port argument.
Thus The better solution is that let admin clients and general clients connect to 2 different port of server.
BWT, the special admin port has more advance functions, including the connect is not related to the limit of maxclients etc
Ref: RFC https://github.com/valkey-io/valkey-rfc/pull/3
Codecov Report
Attention: Patch coverage is 65.21739% with 8 lines in your changes missing coverage. Please review.
Project coverage is 70.68%. Comparing base (
6b3e122) to head (8ac9019). Report is 1 commits behind head on unstable.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/networking.c | 36.36% | 7 Missing :warning: |
| src/server.c | 91.66% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## unstable #1120 +/- ##
============================================
- Coverage 70.83% 70.68% -0.16%
============================================
Files 118 118
Lines 63561 63583 +22
============================================
- Hits 45025 44943 -82
- Misses 18536 18640 +104
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/config.c | 77.63% <ø> (ø) |
|
| src/server.h | 100.00% <ø> (ø) |
|
| src/server.c | 87.40% <91.66%> (-0.02%) |
:arrow_down: |
| src/networking.c | 88.29% <36.36%> (-0.24%) |
:arrow_down: |
Core team discussion:
- Wen will followup if just doing a local connection is sufficient for his use case. That simplifies the management of this new port. If Huawei does want a remote connection, we need to discuss if we will support TLS and how we will do that.
I still think the value of introducing an admin port is limited. When the admin port idea was initially proposed, it was intended to address two scenarios, IIRC:
- Providing more privilege to clients connecting through a special port:
I believe the correct way to handle this use case is through ACLs.
- Working around resource constraints such as
maxclients:
While maxclients is just one example, a harder challenge is when the server’s CPUs are overwhelmed by client traffic, making operator connections unreliable.
Looking at this from the operator's perspective, I see two reasons why an operator needs to connect to the server:
- To run commands like
INFOto collect metrics.
In our case at GCP, we’ve moved to a PUSH model where the server periodically exports its metrics to an external file. This approach avoids the need for direct connections during high-load situations.
- To perform administrative operations (e.g., moving slots, adding replicas).
These operations are generally less time-sensitive and can afford to wait so the priority of ensuring immediate connectivity under resource constraints is lower IMO.
I can create a separate issue to describe the PUSH model in more detail, if folks are interested.
[core team discussion]
- being able to reliably to connect to the server is in general desired, not just for the metrics but also for management scenarios like auto-scaling
need to patch "maxclients", "max fd", and "client eviction" policies.
-
admin port is a general construct that could be reused by multiple parties.
-
security can/should still be achieved via tying the port to the ACLs, somehow.
-
another point to explore is whether these can be achieved in a module.
[decision time]
@valkey-io/core-team, please 👍 or 👎
- Added an admin port dedicated to providing reliable connections only.
- Clients connecting to the admin port do not have any special privileges.
It is the operator's responsibility to determine whether the admin port is firewalled off from the applications.
Separately, if we approve this feature, can we consider renaming the port since it doesn’t provide additional privileges? I’m thinking of names like “service port,” “management port,” or perhaps “aux port.”
Just throwing this out there -
One thing I was thinking is if the problem we actually want to solve is "per port configurations". Or perhaps "per endpoint configurations", if we decouple from just TCP.
If we did this, the case of admin port would simply be setting up an additional endpoint and configuring it with no max clients limit.
Something like:
valkey.conf
...
configure-endpoint type plaintext cluster-broadcast yes maxclients 1000 # user plaintext port
configure-endpoint type tls cluster-broadcast yes maxclients 1000 # user tls port
configure-endpoint type tls maxclients -1 cluster-broadcast no # admin port
I think this opens it up to future improvements (e.g. perhaps prioritization capabilities to support management connections from being starved by user traffic). Or we can combine ACLs into it, e.g. only a certain endpoint can authenticate to certain ACLs, or an endpoint can automatically authenticate to a certain ACL (if the operator wants to use a firewall as their security perimeter). Obviously - we could choose to get as complicated or simple as we'd like with it.
In my head, I'm thinking of Envoy, where you can set up many "listeners", which each has its own configuration.
As a side benefit, it also opens us up to a mechanism for supporting module owned endpoints, where the module is responsible for the whole connection flow, something like:
module load my_module.so
configure-endpoint type my_module_conn_type
Obviously, it is a superset of the problem solved by just the admin port alone, so that comes with its own tradeoffs. But it feels like a more flexible direction then having a prescriptive set of configurations attached to an admin port IMO
One thing I was thinking is if the problem we actually want to solve is "per port configurations".
Spot on. And the file descriptor limit could be solved independently without introducing a new policy (i.e., dynamically extending the events array)
configure-endpoint type tls maxclients -1 cluster-broadcast no # admin port
The only potential downside is that this will need to be a SPECIAL config and we will have to provide custom parsing logic but I feel that the additional complexity is justified.
Just throwing this out there -
One thing I was thinking is if the problem we actually want to solve is "per port configurations". Or perhaps "per endpoint configurations", if we decouple from just TCP.
If we did this, the case of admin port would simply be setting up an additional endpoint and configuring it with no max clients limit.
Something like:
valkey.conf ... configure-endpoint type plaintext cluster-broadcast yes maxclients 1000 # user plaintext port configure-endpoint type tls cluster-broadcast yes maxclients 1000 # user tls port configure-endpoint type tls maxclients -1 cluster-broadcast no # admin portI think this opens it up to future improvements (e.g. perhaps prioritization capabilities to support management connections from being starved by user traffic). Or we can combine ACLs into it, e.g. only a certain endpoint can authenticate to certain ACLs, or an endpoint can automatically authenticate to a certain ACL (if the operator wants to use a firewall as their security perimeter). Obviously - we could choose to get as complicated or simple as we'd like with it.
In my head, I'm thinking of Envoy, where you can set up many "listeners", which each has its own configuration.
As a side benefit, it also opens us up to a mechanism for supporting module owned endpoints, where the module is responsible for the whole connection flow, something like:
module load my_module.so configure-endpoint type my_module_conn_typeObviously, it is a superset of the problem solved by just the admin port alone, so that comes with its own tradeoffs. But it feels like a more flexible direction then having a prescriptive set of configurations attached to an admin port IMO
I really like this idea, we should have it in our agenda.
If we did this, the case of admin port would simply be setting up an additional endpoint and configuring it with no max clients limit.
The syntax seems a bit atrocious, but the idea seems sound to me. I think another thread it was mentioned it would be nice if we could have a native map configuration. So it could instead look something like:
configure-endpoint name=admin type=plaintext cluster-broadcast=yes maxclients=1000
Which comes across as so much more readable to me :) That gets away from the issue Ping mentioned about it having to be special. (but we have more specialness like having more than one of the configuration). Well, we can hash through the syntax.
configure-endpoint name=admin type=plaintext cluster-broadcast=yes maxclients=1000
I have two questions:
-
Will the
maxclientsconfig be changed to limit individual endpoints rather than the entire server? If so, the originalmaxclientsconfig should be deprecated, which is a significant change, and we need to consider issues related to upgrading from older versions. -
The semantics of the
cluster-broadcastattribute confuse me. Is it intended to describe thecluster-bus-port? If so, it should be given a more accurate name since the listening events on the cluster bus are different from the services provided to users. Perhaps a name likeprotocol=resp|clustercould be used.
Will the maxclients config be changed to limit individual endpoints rather than the entire server?
I think it is up for discussion. Probably for the time being - having maxclients-enforcement=off would be more usable. Having per-endpoint resource restriction can be an iterative improvement (or we may want to do something like resource groups, allowing lumping of multiple endpoints into one shared limit, kinda like cgroups for linux)
The semantics of the cluster-broadcast attribute confuse me
The idea is that we broadcast two ports currently to the cluster: port and tls-port. So the idea with this configuration is you would configure at least one port to be the cluster-broadcast port, or optionally one plaintext and one TLS port.
Given the one-to-one mapping and Madelyn's improved syntax proposal, it would probably make more sense like:
configure-endpoint name=user-tcp type=plaintext maxclients-enforcement=on
configure-endpoint name=user-tls type=tls maxclients-enforcement=on
configure-endpoint name=admin type=plaintext maxclients-enforcement=off
cluster-announce-plaintext-endpoint "user-tcp"
cluster-announce-tls-endpoint "user-tls"