Implementation of HistogramVec by providing LuaC bindings to Rust module prometheus

Open ochaton opened this issue 8 months ago • 3 comments

I suggest you to look at PoC (IMO good enough) which demonstrates power of Lua-bindings to Rust ecosystem, and how it can be used to greatly solve our performance issues in tarantool/metrics.

I've implemented only HistogramVec with slight change of API:

[change] Users must specify label_names on creation of HistogramVec (HistogramVec:new()), and pass all of the values in Histogram:observe(value, <kv-label_pairs>)
[change] HistogramVec:collect() is slower than Histogram:collect() because it has to copy Rust-strings into LuaStrings and allocate many LuaTables. :collect() mainly used for export metrics via http-server, and they basically are converted into String anyway. So HistogramVec:collect_str() can be used for that, it just builds final strings which prometheus can understand.
[change] To allow dynamic reconfiguration of global_label_pairs, Rust Registry is implemented, which stores global_label_pairs. Everywhere we need to reset global_label_pairs we should pass a copy of them into Rust.

[X] Tests
[ ] Changelog
[ ] Documentation (README and rst)
[X] Rockspec and rpm spec

PoC for #461

Motivation - Performance increase (Results)

Scenario	Histogram:observe()	HistogramVec:observe()	Histogram:collect()	HistogramVec:collect_str()
No labels	57K/s	3686K/s (+6366%)	103K/s	151K/s (+46%)
1 label	48K/s	1235K/s (+2473%)	53K/s	75K/s (+41%)

No labels - means, Histogram is created without custom labels. Meaning, if User does not use labels, he must not pay for them.

1 label - means, Histogram support single label_key. Mostly used to add boolean ok = false|true label.

Table that follows shows performance increase of overall collect of all histogram collectors.

Lua:collect()	Rust:collect()
32K/s	49K/s (+53%)

.rocks/bin/luabench -d 3s
Tarantool version: Tarantool 3.3.0-0-g5fc82b8
Tarantool build: Darwin-arm64-RelWithDebInfo (static)
Tarantool build flags:  -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common  -fmacro-prefix-map=/var/folders/8x/1m5v3n6d4mn62g9w_65vvt_r0000gn/T/tarantool_install1980638789=. -std=c11 -Wall -Wextra -Wno-gnu-alignof-expression -Wno-cast-function-type -O2 -g -DNDEBUG -ggdb -O2 
CPU: Apple M1 @ 8
JIT: Disabled
Duration: 3s
Global timeout: 60

--- BENCH: histogram_bench::bench_001_no_labels_001_observe:histogram:observe
  206723             17634 ns/op             56709 op/s     4872 B/op   +960.50MB
--- BENCH: histogram_bench::bench_001_no_labels_001_observe:histogram_vec:observe
13372211               271.3 ns/op         3686003 op/s        0 B/op   +928B


--- BENCH: histogram_bench::bench_001_no_labels_002_collect:histogram:collect
  370499              9687 ns/op            103231 op/s     4239 B/op   +1498.14MB
--- BENCH: histogram_bench::bench_001_no_labels_002_collect:histogram_vec:collect
   91363             39864 ns/op             25085 op/s    21455 B/op   +1869.47MB
--- BENCH: histogram_bench::bench_001_no_labels_002_collect:histogram_vec:collect_str
  549039              6597 ns/op            151582 op/s       96 B/op   +50.27MB

--- BENCH: histogram_bench::bench_002_one_label_001_observe:histogram:observe
  176461             20869 ns/op             47918 op/s     4967 B/op   +836.05MB
--- BENCH: histogram_bench::bench_002_one_label_001_observe:histogram_vec:observe
 4459330               809.3 ns/op         1235665 op/s      192 B/op   +816.53MB



--- BENCH: histogram_bench::bench_002_one_label_002_collect:histogram:collect
  204015             18952 ns/op             52764 op/s     8151 B/op   +1586.09MB
--- BENCH: histogram_bench::bench_002_one_label_002_collect:histogram_vec:collect
   43171             83987 ns/op             11907 op/s    45431 B/op   +1870.49MB
--- BENCH: histogram_bench::bench_002_one_label_002_collect:histogram_vec:collect_str
  273251             13182 ns/op             75861 op/s       96 B/op   +25.02MB


--- BENCH: histogram_bench::bench_003_lua_gather
  113821             31461 ns/op             31785 op/s    13512 B/op   +1466.71MB
--- BENCH: histogram_bench::bench_003_rust_gather
  175913             20369 ns/op             49094 op/s       96 B/op   +16.11MB

Build of Rust part is already orchestrated using CMake, so make test, make .rocks and others should work fine. You can repeat my benchmarks using make bench.

What's next to decide:

Decide should metrics-rs live in tarantool/metrics or it is better to move it into separate repo? (can we change tarantool/metrics API for that)
Bundling into Tarantool binary

Mar 03 '25 18:03 ochaton

metrics metrics copied to clipboard

Implementation of HistogramVec by providing LuaC bindings to Rust module prometheus

Motivation - Performance increase (Results)

metrics
metrics copied to clipboard