dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

command duration

Open kissingtiger opened this issue 2 years ago • 8 comments

Dragonfly instance, there is no command duration information in the metrics information. When troubleshooting problems, the business side pays more attention to the command duration indicator.

kissingtiger avatar Jun 08 '23 08:06 kissingtiger

We already collect stats about request execution time into request_latency_usec in main_service.cc, so it might be just a matter of exporting it.

royjacobson avatar Jun 12 '23 09:06 royjacobson

Hi, I am thinking about solving this issue as a first issue.

Is this a wanted feature? Is it ready to be implemented? If so, @royjacobson, what do you mean by exporting it? Should it be exported via output of a CLI command?

dor132 avatar Jun 29 '23 18:06 dor132

@dor132 @royjacobson Yes, this is the performance parameter we need more. Can we put this in the metrics output? In this way, it is more convenient for us to display the delay data of instance execution commands through Prometheus and Grafana

kissingtiger avatar Jun 30 '23 02:06 kissingtiger

Adding to what @kissingtiger said:

Dragonfly exposes metrics data in the Prometheus format through the dfly:6379/metrics HTTP page. To add a new metric, you can look at the code of the PrintPrometheusMetrics function in server_family.cc.

The web page should look like this -

image

royjacobson avatar Jul 02 '23 09:07 royjacobson

@royjacobson Displayed on this webpage, it is very perfect and consistent with my understanding.

Is this expected to be launched in the next version?

kissingtiger avatar Jul 02 '23 10:07 kissingtiger

@royjacobson Displayed on this webpage, it is very perfect and consistent with my understanding.

Is this expected to be launched in the next version?

The screenshot is of the current metrics page, which doesn't contain command duration metrics yet.

royjacobson avatar Jul 02 '23 10:07 royjacobson

@royjacobson OK,we need command duration metrics .

kissingtiger avatar Jul 03 '23 08:07 kissingtiger

@royjacobson Displayed on this webpage, it is very perfect and consistent with my understanding.

Is this expected to be launched in the next version?

The screenshot is of the current metrics page, which doesn't contain command duration metrics yet.

Ok. I see three options to display the command duration in the metrics page:

  • an average for each command type
  • an exact time per command executed
  • an average for all commands executed up until a certain time

Should it be one of these options, or a completely different thing I missed?

dor132 avatar Jul 04 '23 11:07 dor132

@royjacobson @kissingtiger What do you think?

dor132 avatar Jul 10 '23 04:07 dor132

@dor132 very good! I'm looking forward to it. Is this feature expected to be available in the next version?

kissingtiger avatar Jul 10 '23 08:07 kissingtiger

Hi @dor132 I see that redis returns in info commandstats the following mertrics calls=XXX,usec=XXX,usec_per_call=XXX currently dragonfly returns only the calls=XXX I think it will be good to add the use and the usec_per_call in the info commandstats and add this to metrics page the usec is the total time per command and the usec_per_call is the average i.e usec/calls

adiholden avatar Jul 10 '23 12:07 adiholden

Hey, any chance to get this assigned?

It has also been requested by one of our costumes "so we can actually see how dragonfly is performing".

(they refered to exporting it as Prometheus metrics)

ashotland avatar Jul 13 '23 14:07 ashotland

I think it would be very neat to export the command duration/latency as a prometheus histrogram as well.

royjacobson avatar Jul 13 '23 15:07 royjacobson

It's not a trivial task to gather histograms efficiently. Basically you need a high-resolution histogram per each command for each thread. It's 200 x 100 x |threads| space and even then, you can not compute necessarily compute predefined percentiles (only their approximations).
Btw, redis latencies are global across the process lifetime, IMHO, meaning they are not decaying. We can start with that. (for comparison, our HTTP latencies are moving averages, which are more useful imho ).

romange avatar Jul 13 '23 17:07 romange

When evaluating Redis Cluster vs DragonflyDB this is the most immediate gap that we've noticed - we can only judge Dragonfly's performance from outside-in using our application metrics, but there is no latency information provided by Dragonfly itself. This is a noticeable regression vs our current setup. Would love to be able to see types of commands executed along with latencies.

dmitri-lerko avatar Jul 14 '23 19:07 dmitri-lerko

We will introduce this feature soon

romange avatar Jul 15 '23 09:07 romange

It's not a trivial task to gather histograms efficiently. Basically you need a high-resolution histogram per each command for each thread. It's 200 x 100 x |threads| space and even then, you can not compute necessarily compute predefined percentiles (only their approximations). Btw, redis latencies are global across the process lifetime, IMHO, meaning they are not decaying. We can start with that. (for comparison, our HTTP latencies are moving averages, which are more useful imho ).

I kind of assumed that the dashboards people use just collect the data over time and compute some kind of a derivative to show the 'current' latencies. Is this not the case?

re:histograms, I agree it's a lot more work but I don't think it's that hard and I think it's useful if we could do it intelligently.

royjacobson avatar Jul 16 '23 17:07 royjacobson

you are right, exposing count, sum that are monotonically increasing is enough for computing a moving average. And i just noticed redis7 even exposes percentiles. In any case it will be a different task and different priority.

image

romange avatar Jul 17 '23 05:07 romange

Web says t-digest is the data structure for probabilistic estimation of percentiles for online data and Redis Labs even released a data structure as part of their Stack API: https://redis.com/blog/t-digest-in-redis-stack/

romange avatar Jul 17 '23 05:07 romange