valkey
valkey copied to clipboard
[NEW] add a management-port
Problem:
Sometimes the primary workload of a Valkey server will become so large that it causes the latency of even trivial commands like PING to become unacceptable. When this happens the server is still healthy and functioning, but it fails health checks that come in on the client port (default 6379). This can then cause monitoring software such as Sentinel to cause failovers, which just makes the problem worse.
Proposed solution:
Add a new port dedicated to a low frequency stream of high urgency management commands. The PING is a good example. Other possibilities might be SHUTDOWN and BGSAVE. It would NOT be suitable for a replica-client, and is not intended for internal use for clustering, but it might be reasonable to use it for the CLUSTER administrative sub-commands.
Additional Benefit
This new management port could be assigned to a control-plane while the client port and cluster port remain on the data-plane (as described in Release It! by Michael T. Nygard).
Other solutions considered:
Prioritizing commands. This would require pre-parsing the beginning of each command so that it could be handled out-of-order.
So, the main problem we are trying to solve is that the Valkey engine should be "reliable" to health checks and management operations, and not be stalled by long running developer applications. An alternative port works well, we could even have it return a very subset subset of data in the RESP format from a separate thread so it is always reliable.
However, I think the ideal solution would be to make it so that the engine is practically always available to commands. We can do this in three parts:
- Make it so that health checks can always be served from long running commands. Basically, implement some way to "yield" out of a command to serve a very specific subset of commands. We started this work in Redis by allowing module yielding commands to disallow any commands from being executed.
- Add support to mark a client as "higher priority" on the event loop, so that it get's served with some regular cadence. There are a few strategies we could do to mark the clients (local connections, unix connections, based off of an admin flag like in https://github.com/valkey-io/valkey/issues/469). There are also ways to make the client higher priority, we could introduce a second event loop just to pool for those higher priority connections.
- Make sure that we don't spend too much time on anything else. This includes cleaning up deep command pipelines, and making sure anything like a flush is also periodically does this yielding.
IO Threads might also be a pathway here. We could have a separate thread start polling on the main event loop if a command has been running too long.
The benefit of prioritization, is that it requires no end user improvements and also helps solve the cluster-bus issues.
I agree yielding inside "long-commands" (probably need to consider all commands lack the CMD_FAST flag) is a good option. I think we will have to think what commands we will allow when yielding. For example it might not be good to start BGSAVE process while we yield since that would break save atomicity. I would suggest we start by introducing clients/users prioritization and then build a good yielding infrastructure allowing a minimal set of commands. I guess we can start by using the same mechanism used for lua timeout, where we use a minimized cron and handle some commands like script kill and shutdown nosave. Maybe we can work to incrementally extend the commands we allow?
guess we can start by using the same mechanism used for lua timeout
One problem with lua timeout is we throw errors to end users, let's not do that here.
Is there anything to read that would help me understand more about any existing thoughts on how to make long-running commands (e.g., KEYS *) yeild to some other commands (e.g., PING)? Requirements, designs, or PRs (rejected or paused)?
Is there anything to read that would help me understand more about any existing thoughts on how to make long-running commands (e.g., KEYS *) yeild to some other commands (e.g., PING)? Requirements, designs, or PRs (rejected or paused)?
Nothing in the open AFAIK.
The Async IO threads refactoring says "Read Operation: The IO thread will only read and parse a single command." This means that they will not "look ahead" to see if there is a PING that needs to be prioritized. This is one possible solution: add another thread and a work buffer so that we can de-queue commands, scan them to see if they are PING, then put them in the work buffer. I do not claim that this is the only way, or even a tolerable way, just "a" way. Can anyone suggest a better way, so that we can compare it to adding a management port?
Can we allow I/O threads to serve PING? Next we can allow them to serve cached CLUSTER SLOTS...
If a command like KEYS is yielding, we can only execute some subset of commands, in particular no write commands, so yielding sounds like a somewhat difficult way forward.
If a command like KEYS is yielding, we can only execute some subset of commands, in particular no write commands, so yielding sounds like a somewhat difficult way forward.
I would say no read commands either. I think the intention is just serve ping/pong messages for health. The point is not to make the core multi-threaded, just allow it to be responsive while processing a command.
There are two ways that PING could be used: health check and latency check. When doing a latency check you don't want PING to be given special priority. Is there any risk that someone is using PING this way?
When doing a latency check you don't want PING to be given special priority. Is there any risk that someone is using PING this way?
We use ping that way :D. That is a tradeoff, but I would rather info
give realistic picture about latency as opposed to just relying on the time it takes a PING
to respond.
INFO cannot measure total latency because total latency includes the entire network round trip time.
It might be useful for INFO to report an average latency, but even measuring that is problematic. For example, if it isn't a fast command, the latency is different. Would you want INFO to be a command that gets a high priority? What window (1min, 5min, etc) would you use for calculating the average latency? Perhaps p99 latency would be better. So many options. Even if the right information is available from INFO, how do you get all of the existing users of PING to switch? Would that make it a breaking change? The output from INFO is long and ugly when such a small amount of data is being requested - perhaps this use case would be improved by either new options to INFO or a new SPECIFIC-INFO command that accepts a list of fields instead of or in addition to a list of sections, so that clients could fetch just 1 field if that is all they need. Still, if the goal is to measure the total round trip, PING remains superior to INFO if only because it doesn't take any arguments at all.
Reading this again now, I might have totally misunderstood the suggestion. Is the proposal just to support a secondary port who also puts commands on the same main event loop? I thought the suggestion was to have a secondary eventloop dedicated to serving requests on this port.
I'm thinking of this as being in the requirements stage. At this point I hope it will be read as a wish for a new feature and, if possible, should be read as implementation neutral. So, if there is a way to get all the same advantages without actually adding a new port, that's fine. PING needs to be prioritized, but how that is done is not a requirement.
Since a "proposed solution" was part of the template, I put one in. It does make it less of a whine, but it is somewhat misleading.
The problem, slow response to PING, is real and a great motivation for this, and would be an immediate and easy way to take advantage of a new port.
What I really want is a new port. It is expected that it will only be used by administrators and that they will not expose it to anything they do not keep tightly locked down (such as an untrusted network). As such, I would like commands on this new port to be given the highest possible priority. This port might be used, for example, to shut down the server, or to temporarily suspend serving clients on the main port while dealing with an emergency, or killing a client that is misbehaving.
An additional requirement: the new port should be allowed to be on a different network card. This would allow for the type of data-center-control-plane described in Release It! by Michael T. Nygard In my own mind, this is the most valuable consequence of adding a new port. The control-plane-network can be kept clear of data and totally invisible to all but the administrators and can be used to control both Valkey and non-Valkey servers at the same time. The control-plane becomes a natural way for a data-center-dashboard to provide administrators with the support they need.
how would we control access to this admin port? IP addresses? Dedicated admin user (leveraging ACLs)?
In layers. Definitely should be honoring ACL rules on the port, but the admin id(s) should be decided by the sys-admins. Passwords, of course. If I were configuring such a service-port, I'd prefer to put it on a different ip-address, and make that ip-address invisible to the clients of the server. Ideally, all my servers would be reachable via a control plane using an independent network out of reach of any of the customers of my data center.
It seem we should choose either this management port feature or #666, but not both?
I'm leaning towards prefering management port, because this feature relies on routing configuration in the network to prevent external peers to access it, rather than config of trusted IP addresses in Valkey. This kind of config better belongs in the network area.
The management port would not be sufficient when Valkey is running on a host where all of the host's IP addresses are exposed to untrustworthy clients. (For example, when the host only has a single IP address.)
The management port would be safer than trusted/untrusted when it can be placed on a hidden IP address.
Is there any reason to think that someone might want to nest these two safety measures? @dmitrypol asked about the possibility of a buggy monitoring app. Would it make sense to build something like that as a trusted client, but not giving it access to the management port?
@hwware - you marked it as completed but is there a PR merge? I can't find it.