dragonfly command duration

Dragonfly instance, there is no command duration information in the metrics information. When troubleshooting problems, the business side pays more attention to the command duration indicator.

Jun 08 '23 08:06 kissingtiger

We already collect stats about request execution time into request_latency_usec in main_service.cc, so it might be just a matter of exporting it.

Jun 12 '23 09:06 royjacobson

Hi, I am thinking about solving this issue as a first issue.

Is this a wanted feature? Is it ready to be implemented? If so, @royjacobson, what do you mean by exporting it? Should it be exported via output of a CLI command?

Jun 29 '23 18:06 dor132

@dor132 @royjacobson Yes, this is the performance parameter we need more. Can we put this in the metrics output? In this way, it is more convenient for us to display the delay data of instance execution commands through Prometheus and Grafana

Jun 30 '23 02:06 kissingtiger

Adding to what @kissingtiger said:

Dragonfly exposes metrics data in the Prometheus format through the dfly:6379/metrics HTTP page. To add a new metric, you can look at the code of the PrintPrometheusMetrics function in server_family.cc.

The web page should look like this -

Jul 02 '23 09:07 royjacobson

@royjacobson Displayed on this webpage, it is very perfect and consistent with my understanding.

Is this expected to be launched in the next version?

Jul 02 '23 10:07 kissingtiger

@royjacobson Displayed on this webpage, it is very perfect and consistent with my understanding.

Is this expected to be launched in the next version?

The screenshot is of the current metrics page, which doesn't contain command duration metrics yet.

Jul 02 '23 10:07 royjacobson

@royjacobson OK，we need command duration metrics .

Jul 03 '23 08:07 kissingtiger

@royjacobson Displayed on this webpage, it is very perfect and consistent with my understanding.

Is this expected to be launched in the next version?

The screenshot is of the current metrics page, which doesn't contain command duration metrics yet.

Ok. I see three options to display the command duration in the metrics page:

an average for each command type
an exact time per command executed
an average for all commands executed up until a certain time

Should it be one of these options, or a completely different thing I missed?

Jul 04 '23 11:07 dor132

@royjacobson @kissingtiger What do you think?

Jul 10 '23 04:07 dor132

@dor132 very good! I'm looking forward to it. Is this feature expected to be available in the next version?

Jul 10 '23 08:07 kissingtiger

Hi @dor132 I see that redis returns in info commandstats the following mertrics calls=XXX,usec=XXX,usec_per_call=XXX currently dragonfly returns only the calls=XXX I think it will be good to add the use and the usec_per_call in the info commandstats and add this to metrics page the usec is the total time per command and the usec_per_call is the average i.e usec/calls

Jul 10 '23 12:07 adiholden

Hey, any chance to get this assigned?

It has also been requested by one of our costumes "so we can actually see how dragonfly is performing".

(they refered to exporting it as Prometheus metrics)

Jul 13 '23 14:07 ashotland

I think it would be very neat to export the command duration/latency as a prometheus histrogram as well.

Jul 13 '23 15:07 royjacobson

It's not a trivial task to gather histograms efficiently. Basically you need a high-resolution histogram per each command for each thread. It's 200 x 100 x |threads| space and even then, you can not compute necessarily compute predefined percentiles (only their approximations).
Btw, redis latencies are global across the process lifetime, IMHO, meaning they are not decaying. We can start with that. (for comparison, our HTTP latencies are moving averages, which are more useful imho ).

Jul 13 '23 17:07 romange

When evaluating Redis Cluster vs DragonflyDB this is the most immediate gap that we've noticed - we can only judge Dragonfly's performance from outside-in using our application metrics, but there is no latency information provided by Dragonfly itself. This is a noticeable regression vs our current setup. Would love to be able to see types of commands executed along with latencies.

Jul 14 '23 19:07 dmitri-lerko

We will introduce this feature soon

Jul 15 '23 09:07 romange

It's not a trivial task to gather histograms efficiently. Basically you need a high-resolution histogram per each command for each thread. It's 200 x 100 x |threads| space and even then, you can not compute necessarily compute predefined percentiles (only their approximations). Btw, redis latencies are global across the process lifetime, IMHO, meaning they are not decaying. We can start with that. (for comparison, our HTTP latencies are moving averages, which are more useful imho ).

I kind of assumed that the dashboards people use just collect the data over time and compute some kind of a derivative to show the 'current' latencies. Is this not the case?

re:histograms, I agree it's a lot more work but I don't think it's that hard and I think it's useful if we could do it intelligently.

Jul 16 '23 17:07 royjacobson

you are right, exposing count, sum that are monotonically increasing is enough for computing a moving average. And i just noticed redis7 even exposes percentiles. In any case it will be a different task and different priority.

Jul 17 '23 05:07 romange

Web says t-digest is the data structure for probabilistic estimation of percentiles for online data and Redis Labs even released a data structure as part of their Stack API: https://redis.com/blog/t-digest-in-redis-stack/

Jul 17 '23 05:07 romange

dragonfly dragonfly copied to clipboard

command duration

dragonfly
dragonfly copied to clipboard