prometheus icon indicating copy to clipboard operation
prometheus copied to clipboard

tsdb: add support for OOO exemplars in CircularExemplarStorage

Open juliusmh opened this issue 2 months ago • 6 comments

What this PR does

Support out-of-order exemplar ingestion in the CircularExemplarStorage. The out-of-order window is controlled by the regular OOOTimeWindow flag for samples.

AddExemplar

During ingestion, exemplars are appended to / discarded from the circular buffer as usual, but their position in the (doubly) linked list is controlled to maintain temporal ordering. Adding elements to the head or the tail of the linked list is trivial (and optimized), adding elements in the middle requires us to find an insertion point, which is achieved by traversing the linked list. A back link was introduced (doubly linked list) which dramatically speeds up finding the insertion point (out-of-order exemplars lie usually very close to the newest exemplar).

Resize

Grow extends the buffer to the right and rearranges entries. Shrink the buffer if it has enough space to accommodate the new size by trimming it. Otherwise, we calculate a specific range to delete starting from ce.nextIndex to (ce.nextIndex + (oldSize - newSize)) % oldSize. Entries at these indices are removed and the remaining one are rearranged. Resizing to 0 (disable exemplars) returns early.

When rearranging buffer entries we adjust next/prev and index.newest/oldest pointers to point to the correct entries. This is handled in copyExemplarRanges.

Benchmarks

BenchmarkResizeExemplars
goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/tsdb
cpu: Apple M4 Pro
                                            │ /tmp/bench_resize_old │       /tmp/bench_resize_new        │
                                            │        sec/op         │   sec/op     vs base               │
ResizeExemplars/grow-100000-to-200000-14              2276.8µ ± 14%   850.0µ ± 2%  -62.67% (p=0.002 n=6)
ResizeExemplars/shrink-100000-to-50000-14             1019.3µ ±  1%   281.1µ ± 3%  -72.42% (p=0.002 n=6)
ResizeExemplars/grow-1000000-to-2000000-14             22.86m ±  3%   11.09m ± 3%  -51.49% (p=0.002 n=6)
ResizeExemplars/shrink-1000000-to-500000-14           10.064m ±  1%   3.040m ± 1%  -69.80% (p=0.002 n=6)
geomean                                                4.807m         1.685m       -64.95%

                                            │ /tmp/bench_resize_old │       /tmp/bench_resize_new        │
                                            │         B/op          │     B/op      vs base              │
ResizeExemplars/grow-100000-to-200000-14               11.12Mi ± 0%   12.21Mi ± 0%  +9.82% (p=0.002 n=6)
ResizeExemplars/shrink-100000-to-50000-14              2.784Mi ± 0%   3.055Mi ± 0%  +9.74% (p=0.002 n=6)
ResizeExemplars/grow-1000000-to-2000000-14             113.6Mi ± 0%   122.1Mi ± 0%  +7.43% (p=0.002 n=6)
ResizeExemplars/shrink-1000000-to-500000-14            28.45Mi ± 0%   30.52Mi ± 0%  +7.30% (p=0.002 n=6)
geomean                                                17.79Mi        19.31Mi       +8.56%

                                            │ /tmp/bench_resize_old │       /tmp/bench_resize_new       │
                                            │       allocs/op       │ allocs/op   vs base               │
ResizeExemplars/grow-100000-to-200000-14              1036.000 ± 0%   3.000 ± 0%  -99.71% (p=0.002 n=6)
ResizeExemplars/shrink-100000-to-50000-14              511.000 ± 0%   2.000 ± 0%  -99.61% (p=0.002 n=6)
ResizeExemplars/grow-1000000-to-2000000-14           10517.000 ± 0%   4.000 ± 0%  -99.96% (p=0.002 n=6)
ResizeExemplars/shrink-1000000-to-500000-14           5131.000 ± 0%   2.000 ± 0%  -99.96% (p=0.002 n=6)
geomean                                                 2.312k        2.632       -99.89%

BenchmarkAddExemplar
goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/tsdb
cpu: Apple M4 Pro
                                                      │ /tmp/bench_add_ex_old │       /tmp/bench_add_ex_new2       │
                                                      │        sec/op         │    sec/op     vs base              │
AddExemplar/10000/1000-14                                         413.1µ ± 2%   421.1µ ± 10%  +1.96% (p=0.015 n=6)
AddExemplar/100000/1000-14                                        4.225m ± 1%   4.133m ±  5%       ~ (p=0.065 n=6)
AddExemplar/1000000/1000-14                                       42.93m ± 1%   41.94m ±  0%  -2.30% (p=0.002 n=6)
AddExemplar/10000/10000-14                                        440.2µ ± 0%   430.9µ ±  0%  -2.12% (p=0.002 n=6)
AddExemplar/100000/10000-14                                       4.243m ± 0%   4.149m ±  1%  -2.21% (p=0.002 n=6)
AddExemplar/1000000/10000-14                                      43.90m ± 0%   43.41m ±  5%       ~ (p=0.065 n=6)
AddExemplar/10000/100000-14                                       485.9µ ± 1%   462.8µ ±  0%  -4.75% (p=0.002 n=6)
AddExemplar/100000/100000-14                                      4.407m ± 2%   4.192m ±  0%  -4.89% (p=0.002 n=6)
AddExemplar/1000000/100000-14                                     45.43m ± 0%   44.45m ±  1%  -2.16% (p=0.002 n=6)
AddExemplar_OutOfOrder/empty/reverse-14                                         39.27m ±  1%
AddExemplar_OutOfOrder/empty/out-of-order-14                                    44.95m ±  0%
AddExemplar_OutOfOrder/empty/multi-in-order-14                                  459.8µ ±  1%
AddExemplar_OutOfOrder/empty/multi-reverse-14                                   461.1µ ±  0%
AddExemplar_OutOfOrder/empty/multi-out-of-order-14                              605.1µ ±  1%
AddExemplar_OutOfOrder/empty/in-order-14                                        38.86m ±  0%
AddExemplar_OutOfOrder/full-one/out-of-order-14                                 44.92m ±  0%
AddExemplar_OutOfOrder/full-one/multi-in-order-14                               462.7µ ±  1%
AddExemplar_OutOfOrder/full-one/multi-reverse-14                                462.9µ ±  0%
AddExemplar_OutOfOrder/full-one/multi-out-of-order-14                           608.0µ ±  1%
AddExemplar_OutOfOrder/full-one/in-order-14                                     38.88m ±  0%
AddExemplar_OutOfOrder/full-one/reverse-14                                      39.29m ±  0%
AddExemplar_OutOfOrder/full-multiple/multi-reverse-14                           467.9µ ±  1%
geomean                                                           4.384m        4.004m        -2.22%

                                                      │ /tmp/bench_add_ex_old │       /tmp/bench_add_ex_new2        │
                                                      │         B/op          │     B/op       vs base              │
AddExemplar/10000/1000-14                                        9.735Ki ± 0%    9.735Ki ± 0%       ~ (p=0.636 n=6)
AddExemplar/100000/1000-14                                       98.80Ki ± 0%    98.80Ki ± 0%       ~ (p=0.054 n=6)
AddExemplar/1000000/1000-14                                     1012.9Ki ± 0%   1012.9Ki ± 0%  -0.00% (p=0.032 n=6)
AddExemplar/10000/10000-14                                       9.740Ki ± 0%    9.740Ki ± 0%       ~ (p=0.675 n=6)
AddExemplar/100000/10000-14                                      98.80Ki ± 0%    98.81Ki ± 0%       ~ (p=0.591 n=6)
AddExemplar/1000000/10000-14                                    1012.9Ki ± 0%   1012.9Ki ± 0%       ~ (p=0.126 n=6)
AddExemplar/10000/100000-14                                      9.742Ki ± 0%    9.743Ki ± 0%       ~ (p=0.411 n=6)
AddExemplar/100000/100000-14                                     98.80Ki ± 0%    98.80Ki ± 0%       ~ (p=0.065 n=6)
AddExemplar/1000000/100000-14                                   1012.9Ki ± 0%   1012.9Ki ± 0%  -0.00% (p=0.048 n=6)
AddExemplar_OutOfOrder/empty/reverse-14                                          39.06Ki ± 0%
AddExemplar_OutOfOrder/empty/out-of-order-14                                     39.06Ki ± 0%
AddExemplar_OutOfOrder/empty/multi-in-order-14                                   312.0Ki ± 0%
AddExemplar_OutOfOrder/empty/multi-reverse-14                                    312.0Ki ± 0%
AddExemplar_OutOfOrder/empty/multi-out-of-order-14                               352.5Ki ± 0%
AddExemplar_OutOfOrder/empty/in-order-14                                         39.06Ki ± 0%
AddExemplar_OutOfOrder/full-one/out-of-order-14                                  39.06Ki ± 0%
AddExemplar_OutOfOrder/full-one/multi-in-order-14                                312.0Ki ± 0%
AddExemplar_OutOfOrder/full-one/multi-reverse-14                                 312.0Ki ± 0%
AddExemplar_OutOfOrder/full-one/multi-out-of-order-14                            352.5Ki ± 0%
AddExemplar_OutOfOrder/full-one/in-order-14                                      39.06Ki ± 0%
AddExemplar_OutOfOrder/full-one/reverse-14                                       39.06Ki ± 0%
AddExemplar_OutOfOrder/full-multiple/multi-reverse-14                            311.8Ki ± 0%
geomean                                                          99.15Ki         112.0Ki       +0.00%

                                                      │ /tmp/bench_add_ex_old │       /tmp/bench_add_ex_new2        │
                                                      │       allocs/op       │  allocs/op   vs base                │
AddExemplar/10000/1000-14                                          499.0 ± 0%    499.0 ± 0%       ~ (p=1.000 n=6) ¹
AddExemplar/100000/1000-14                                        4.999k ± 0%   4.999k ± 0%       ~ (p=1.000 n=6) ¹
AddExemplar/1000000/1000-14                                       50.00k ± 0%   50.00k ± 0%       ~ (p=1.000 n=6) ¹
AddExemplar/10000/10000-14                                         499.0 ± 0%    499.0 ± 0%       ~ (p=1.000 n=6) ¹
AddExemplar/100000/10000-14                                       4.999k ± 0%   4.999k ± 0%       ~ (p=1.000 n=6) ¹
AddExemplar/1000000/10000-14                                      50.00k ± 0%   50.00k ± 0%       ~ (p=1.000 n=6)
AddExemplar/10000/100000-14                                        499.0 ± 0%    499.0 ± 0%       ~ (p=1.000 n=6) ¹
AddExemplar/100000/100000-14                                      4.999k ± 0%   4.999k ± 0%       ~ (p=1.000 n=6) ¹
AddExemplar/1000000/100000-14                                     50.00k ± 0%   50.00k ± 0%       ~ (p=0.242 n=6)
AddExemplar_OutOfOrder/empty/reverse-14                                         5.000k ± 0%
AddExemplar_OutOfOrder/empty/out-of-order-14                                    5.000k ± 0%
AddExemplar_OutOfOrder/empty/multi-in-order-14                                  19.90k ± 0%
AddExemplar_OutOfOrder/empty/multi-reverse-14                                   19.91k ± 0%
AddExemplar_OutOfOrder/empty/multi-out-of-order-14                              21.67k ± 0%
AddExemplar_OutOfOrder/empty/in-order-14                                        5.000k ± 0%
AddExemplar_OutOfOrder/full-one/out-of-order-14                                 5.000k ± 0%
AddExemplar_OutOfOrder/full-one/multi-in-order-14                               19.90k ± 0%
AddExemplar_OutOfOrder/full-one/multi-reverse-14                                19.91k ± 0%
AddExemplar_OutOfOrder/full-one/multi-out-of-order-14                           21.67k ± 0%
AddExemplar_OutOfOrder/full-one/in-order-14                                     5.000k ± 0%
AddExemplar_OutOfOrder/full-one/reverse-14                                      5.000k ± 0%
AddExemplar_OutOfOrder/full-multiple/multi-reverse-14                           19.90k ± 0%
geomean                                                           4.996k        7.818k       -0.00%
¹ all samples are equal

Which issue(s) does the PR fix:

Closes https://github.com/prometheus/prometheus/issues/13577

Does this PR introduce a user-facing change?

[FEATURE] Circular exemplar buffer supports out-of-order exemplars. 

juliusmh avatar Nov 03 '25 12:11 juliusmh

@dimitarvdimitrov I consolidated the shrink and grow functions around copyExemplarRanges which handles copying exemplars and applying shifts.

juliusmh avatar Nov 18 '25 01:11 juliusmh

Summary of resize operation on the circular buffer (_ indicating empty entries):

Growing:

  • aaa<next>bbbbb -> bbbbbaaa<next>____

Shrinking:

  • aaa<next>____<next+diff> -> aaa<next>__ (if there is enough room to shrink)
  • aaa<next>bbb<next+diff>cc -> ccaaa<next>__ (if (next + diff) < l)
  • aaa<next+diff>bb<next>ccc -> bb<next>__ (else)

juliusmh avatar Nov 21 '25 18:11 juliusmh

/prombench v3.7.3

krajorama avatar Nov 26 '25 09:11 krajorama

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-17469 and v3.7.3

After the successful deployment (check status here), the benchmarking results can be viewed at:

Available Commands:

  • To restart benchmark: /prombench restart v3.7.3
  • To stop benchmark: /prombench cancel
  • To print help: /prombench help

prombot avatar Nov 26 '25 09:11 prombot

/prombench cancel

krajorama avatar Dec 03 '25 08:12 krajorama

Benchmark cancel is in progress.

prombot avatar Dec 03 '25 08:12 prombot