radix Connection contention

I was doing a bit of perf tuning and by doubling the connection number my latencies got much better. I understand that connections are shared and the more connections "the better" but I'd rather have a way to monitor contention on the connection (gorutines that wait for a connection from the pool) than do guess work especially when the size of my worker pool is parameterized.

Is there a way to expose connection contention? Maybe a log enabled via config?

Thanks a lot for the great project!

Jan 25 '25 19:01 ltagliamonte

Hi @ltagliamonte , can you clarify if you're using v3 or v4? The way the Pool works in each is extremely different, so it's not possible to answer without knowing which we're talking about.

Jan 26 '25 15:01 mediocregopher

Hello @mediocregopher I'm using v4

Jan 26 '25 16:01 ltagliamonte

Nice, thanks. So in v4 this question is a bit tricky because there's potentially two places which could be blocking:

Getting a Conn from the Pool. This can block if all Conns have been removed from the Pool, and it's currently empty. A Conn is only removed from the Pool if the Action which is going to be performed is not shareable, which 99% of the time means it is a blocking command like BRPOP, otherwise the Conn is left in the Pool and shared with other shareable Actions. So for the Pool to be empty (and therefore blocking) you'd have to be doing more non-shareable Actions than there are Conns in the Pool.

If you want to know how many non-shareable Actions are taking place within your Pool you could inspect it using a very simple interface:

type poolWrapper struct {
	radix.Client
	nonShareableActionsGauge atomic.Uint64
}

func (pw *poolWrapper) Do(ctx context.Context, a radix.Action) error {
	if !a.Properties().CanShareConn {
		pw.nonShareableActionsGauge.Add(1)
		defer pw.nonShareableActionsGauge.Add(-1)
		// Or however you want to measure it
	}
	return pw.Client.Do(ctx, a)
}

// Spin up a go-routine to periodically log nonShareableActions

For Actions which are shareable, their EncodeDecode calls will be automatically pipelined within Conn. In effect any blocking which happens at this stage is as a result of network congestion, where either the time it takes to write to the socket or read responses back from it is preventing subsequent Actions from having their turn. If you want to know how many Actions are blocked at this part you could essentially do the opposite of the example above: increment a counter for every active shareable Action. Dividing that by the Pool size would give you roughly the current number of Actions which are blocked per Conn.

What you asked for, a log message like "Action is blocked because the Pool is too small" is unfortunately not something which is easily determined, because all Actions block for some amount of time. The only question is how long is acceptable. If you're using a metrics server like Prometheus then a wrapper like the above can be a great place to record action times on a histogram, and once the time it takes to Do an Action has gotten too high you increase the Pool size some more. If you're not using Prometheus you could use an in-memory histogram library to the same effect.

One final note, which doesn't answer your question but might help, is to check out the WriteFlushInterval field of the Dialer if you haven't yet. By setting that to something like 150 microseconds you can increase the overall throughput of Conns, as it will reduce the number of system calls being made even further.

Jan 26 '25 17:01 mediocregopher

Thanks for the insight @mediocregopher. just to make sure from the code are custom cmds all treated by sharable by the library? I recently introduced a new CMD in kvrocks HSETEXPIRE

Feb 13 '25 23:02 ltagliamonte-dd

@ltagliamonte-dd Yup that's correct, all commands are considered shareable by default, only the blocking commands are exceptions. If you wanted to set different ActionProperties for an Action you could use a custom CmdConfig, or re-implement the Properties method, but I think in this case you don't need to :)

Feb 14 '25 09:02 mediocregopher

thank you @mediocregopher, what happens for Pipelines? it is just an abstraction, the lib will just go over each cmd and share or not the connection?

Feb 14 '25 19:02 ltagliamonte-dd

For Pipeline there's two properties in play. First and foremost is the CanPipeline field of ActionProperties. If any Actions are added to a Pipeline with CanPipeline being false then radix will panic.

The other property is indeed CanShareConn. If any individual Actions are added to a Pipeline with CanShareConn being false then the entire Pipeline will have CanShareConn of false.

In practice there's not any standard redis commands which can be pipelined but not shared (tho I could well be forgetting one), so Pipelines will all be shared. Perhaps in kvrocks there is, but HSETEXPIRE doesn't seem like such a case to me.

Il 14 febbraio 2025 20:06:16 CET, Luigi Tagliamonte @.***> ha scritto:

ltagliamonte-dd left a comment (mediocregopher/radix#356)

thank you @mediocregopher, what happens for Pipelines? it is just an abstraction, the lib will just go over each cmd and share or not the connection?

-- Reply to this email directly or view it on GitHub: https://github.com/mediocregopher/radix/issues/356#issuecomment-2660070148 You are receiving this because you were mentioned.

Message ID: @.***>

Feb 14 '25 20:02 mediocregopher