valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Change the usec_per_call operation from float to long long division

Open bluayer opened this issue 9 months ago • 6 comments

Affected command by this change

INFO commandstats (- only usec_per_call metric)

Changed results example

cmdstat_set:calls=2,usec=36,usec_per_call=18,rejected_calls=1,failed_calls=0 cmdstat_get:calls=3,usec=13,usec_per_call=4,rejected_calls=1,failed_calls=0 cmdstat_info:calls=6,usec=444,usec_per_call=74,rejected_calls=0,failed_calls=0 cmdstat_command|docs:calls=2,usec=3951,usec_per_call=1975,rejected_calls=0,failed_calls=0

Context

The usec_per_call is a simple AVG metric that divides the execution time by the number of calls. However, it was performing the calculation by converting long long to float. For two reasons, usec_per_call does not require the high accuracy of a float level:

  1. There are not many reasons to know the average value in units smaller than microseconds.
  2. It currently displays the value of only two digits after the decimal point for accuracy
  3. The client(or monitoring tool) can also calculate because there are calls and usec metrics.

Therefore, I propose changing the implementation to a simple long long division operation without the floating-point conversion and computation.

This change will benefit users who are using various commands in valkey and using high-resolution metrics in commandstats.

Alternatives

The engine already has calls and usec metrics, so clients or users can calculate usec_per_call accurately enough. Also, engine needs to perform a type conversion and then divide to get this metric for every commands and subcommands.

So I finally want to propose removing this metric, but that would be a breaking change and impact many clients. Therefore, I would first propose removing the unnecessary float conversions.

Simple performance testing (Updated at Apr 15)

Env

  • EC2 r6i.2xlarge
  • random long long number division. 10000 iterations & 10 runs
  • With type casting(float), Without type casting(long long - Below, it's written "int division" )
  • Using rdtsc for getting cpu cycles

Screenshot 2025-04-14 at 1 04 37 PM

bluayer avatar Mar 26 '25 13:03 bluayer

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 71.02%. Comparing base (d56a1f7) to head (017fbe6).

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1888      +/-   ##
============================================
- Coverage     71.06%   71.02%   -0.04%     
============================================
  Files           123      123              
  Lines         65671    65671              
============================================
- Hits          46669    46646      -23     
- Misses        19002    19025      +23     
Files with missing lines Coverage Δ
src/server.c 87.54% <100.00%> (ø)

... and 9 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Mar 26 '25 14:03 codecov[bot]

I have some concerns about this change.

It breaks existing behavior, commands that take <1µs will now just show 0, which looks off and we lose useful precision, especially when profiling fast commands.

I still don’t quite understand why this change is needed, could you provide an example where the current float-based calculation causes issues?

UPDATE: Saw that the precision in µs, this means the reported usec_per_call is not accurate. So we either fix the accuracy or we remove these decimal digits (which is what this change is doing)...

xbasel avatar Mar 27 '25 21:03 xbasel

@xbasel I'm sorry to late response.

The given avg metric(usec_per_call) can be calculated precisely enough at the client using the existing calls and usec metrics that we already know.

Would you please explain your concerns in more detail? I'm curious to know if the nano-second level of precision at the avg metric is truly necessary.

Update : Add simple performance test results.

bluayer avatar Apr 14 '25 03:04 bluayer

Question regarding the benchmark: Is it 25 microseconds for 10000 divisions? We save 5 microseconds per 10000 divisions? That's not much per division.

zuiderkwast avatar Apr 15 '25 07:04 zuiderkwast

Yes. But If users are using high-resolution metrics, the impact could be greater. And according to the previous comment(at this closed PR, #1620 ), I understand that we are continuously improving performance in INFO and it's recommended to calculate derivable metrics on the client.

However, it seems difficult to remove the metric, so I suggest a small improvement first.

bluayer avatar Apr 16 '25 00:04 bluayer

I still think keeping sub-microsecond precision in usec_per_call is useful for usability and diagnostics, even if clients can recompute it. Removing it may degrade precision for fast commands. I get that current timings are at microsecond granularity, but that can be fixed.

Also unclear if float division here causes real-world perf issues; if it does, SIMD acceleration could be considered instead of dropping precision.

Just to clarify, this isn't about nanosecond precision - it's sub-microsecond. For example, a command taking 300ns would still register as 0µs if we drop the decimal. That level of detail can be important when analyzing fast commands like GET, SET, INCR. This can be useful when comparing pre- and post-upgrade scenarios.

xbasel avatar Apr 30 '25 10:04 xbasel