tsdb: add support for OOO exemplars in CircularExemplarStorage
What this PR does
Support out-of-order exemplar ingestion in the CircularExemplarStorage. The out-of-order window is controlled by the regular OOOTimeWindow flag for samples.
AddExemplar
During ingestion, exemplars are appended to / discarded from the circular buffer as usual, but their position in the (doubly) linked list is controlled to maintain temporal ordering. Adding elements to the head or the tail of the linked list is trivial (and optimized), adding elements in the middle requires us to find an insertion point, which is achieved by traversing the linked list. A back link was introduced (doubly linked list) which dramatically speeds up finding the insertion point (out-of-order exemplars lie usually very close to the newest exemplar).
Resize
Grow extends the buffer to the right and rearranges entries. Shrink the buffer if it has enough space to accommodate the new size by trimming it. Otherwise, we calculate a specific range to delete starting from ce.nextIndex to (ce.nextIndex + (oldSize - newSize)) % oldSize. Entries at these indices are removed and the remaining one are rearranged. Resizing to 0 (disable exemplars) returns early.
When rearranging buffer entries we adjust next/prev and index.newest/oldest pointers to point to the correct entries. This is handled in copyExemplarRanges.
Benchmarks
BenchmarkResizeExemplars
goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/tsdb
cpu: Apple M4 Pro
│ /tmp/bench_resize_old │ /tmp/bench_resize_new │
│ sec/op │ sec/op vs base │
ResizeExemplars/grow-100000-to-200000-14 2276.8µ ± 14% 850.0µ ± 2% -62.67% (p=0.002 n=6)
ResizeExemplars/shrink-100000-to-50000-14 1019.3µ ± 1% 281.1µ ± 3% -72.42% (p=0.002 n=6)
ResizeExemplars/grow-1000000-to-2000000-14 22.86m ± 3% 11.09m ± 3% -51.49% (p=0.002 n=6)
ResizeExemplars/shrink-1000000-to-500000-14 10.064m ± 1% 3.040m ± 1% -69.80% (p=0.002 n=6)
geomean 4.807m 1.685m -64.95%
│ /tmp/bench_resize_old │ /tmp/bench_resize_new │
│ B/op │ B/op vs base │
ResizeExemplars/grow-100000-to-200000-14 11.12Mi ± 0% 12.21Mi ± 0% +9.82% (p=0.002 n=6)
ResizeExemplars/shrink-100000-to-50000-14 2.784Mi ± 0% 3.055Mi ± 0% +9.74% (p=0.002 n=6)
ResizeExemplars/grow-1000000-to-2000000-14 113.6Mi ± 0% 122.1Mi ± 0% +7.43% (p=0.002 n=6)
ResizeExemplars/shrink-1000000-to-500000-14 28.45Mi ± 0% 30.52Mi ± 0% +7.30% (p=0.002 n=6)
geomean 17.79Mi 19.31Mi +8.56%
│ /tmp/bench_resize_old │ /tmp/bench_resize_new │
│ allocs/op │ allocs/op vs base │
ResizeExemplars/grow-100000-to-200000-14 1036.000 ± 0% 3.000 ± 0% -99.71% (p=0.002 n=6)
ResizeExemplars/shrink-100000-to-50000-14 511.000 ± 0% 2.000 ± 0% -99.61% (p=0.002 n=6)
ResizeExemplars/grow-1000000-to-2000000-14 10517.000 ± 0% 4.000 ± 0% -99.96% (p=0.002 n=6)
ResizeExemplars/shrink-1000000-to-500000-14 5131.000 ± 0% 2.000 ± 0% -99.96% (p=0.002 n=6)
geomean 2.312k 2.632 -99.89%
BenchmarkAddExemplar
goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/tsdb
cpu: Apple M4 Pro
│ /tmp/bench_add_ex_old │ /tmp/bench_add_ex_new2 │
│ sec/op │ sec/op vs base │
AddExemplar/10000/1000-14 413.1µ ± 2% 421.1µ ± 10% +1.96% (p=0.015 n=6)
AddExemplar/100000/1000-14 4.225m ± 1% 4.133m ± 5% ~ (p=0.065 n=6)
AddExemplar/1000000/1000-14 42.93m ± 1% 41.94m ± 0% -2.30% (p=0.002 n=6)
AddExemplar/10000/10000-14 440.2µ ± 0% 430.9µ ± 0% -2.12% (p=0.002 n=6)
AddExemplar/100000/10000-14 4.243m ± 0% 4.149m ± 1% -2.21% (p=0.002 n=6)
AddExemplar/1000000/10000-14 43.90m ± 0% 43.41m ± 5% ~ (p=0.065 n=6)
AddExemplar/10000/100000-14 485.9µ ± 1% 462.8µ ± 0% -4.75% (p=0.002 n=6)
AddExemplar/100000/100000-14 4.407m ± 2% 4.192m ± 0% -4.89% (p=0.002 n=6)
AddExemplar/1000000/100000-14 45.43m ± 0% 44.45m ± 1% -2.16% (p=0.002 n=6)
AddExemplar_OutOfOrder/empty/reverse-14 39.27m ± 1%
AddExemplar_OutOfOrder/empty/out-of-order-14 44.95m ± 0%
AddExemplar_OutOfOrder/empty/multi-in-order-14 459.8µ ± 1%
AddExemplar_OutOfOrder/empty/multi-reverse-14 461.1µ ± 0%
AddExemplar_OutOfOrder/empty/multi-out-of-order-14 605.1µ ± 1%
AddExemplar_OutOfOrder/empty/in-order-14 38.86m ± 0%
AddExemplar_OutOfOrder/full-one/out-of-order-14 44.92m ± 0%
AddExemplar_OutOfOrder/full-one/multi-in-order-14 462.7µ ± 1%
AddExemplar_OutOfOrder/full-one/multi-reverse-14 462.9µ ± 0%
AddExemplar_OutOfOrder/full-one/multi-out-of-order-14 608.0µ ± 1%
AddExemplar_OutOfOrder/full-one/in-order-14 38.88m ± 0%
AddExemplar_OutOfOrder/full-one/reverse-14 39.29m ± 0%
AddExemplar_OutOfOrder/full-multiple/multi-reverse-14 467.9µ ± 1%
geomean 4.384m 4.004m -2.22%
│ /tmp/bench_add_ex_old │ /tmp/bench_add_ex_new2 │
│ B/op │ B/op vs base │
AddExemplar/10000/1000-14 9.735Ki ± 0% 9.735Ki ± 0% ~ (p=0.636 n=6)
AddExemplar/100000/1000-14 98.80Ki ± 0% 98.80Ki ± 0% ~ (p=0.054 n=6)
AddExemplar/1000000/1000-14 1012.9Ki ± 0% 1012.9Ki ± 0% -0.00% (p=0.032 n=6)
AddExemplar/10000/10000-14 9.740Ki ± 0% 9.740Ki ± 0% ~ (p=0.675 n=6)
AddExemplar/100000/10000-14 98.80Ki ± 0% 98.81Ki ± 0% ~ (p=0.591 n=6)
AddExemplar/1000000/10000-14 1012.9Ki ± 0% 1012.9Ki ± 0% ~ (p=0.126 n=6)
AddExemplar/10000/100000-14 9.742Ki ± 0% 9.743Ki ± 0% ~ (p=0.411 n=6)
AddExemplar/100000/100000-14 98.80Ki ± 0% 98.80Ki ± 0% ~ (p=0.065 n=6)
AddExemplar/1000000/100000-14 1012.9Ki ± 0% 1012.9Ki ± 0% -0.00% (p=0.048 n=6)
AddExemplar_OutOfOrder/empty/reverse-14 39.06Ki ± 0%
AddExemplar_OutOfOrder/empty/out-of-order-14 39.06Ki ± 0%
AddExemplar_OutOfOrder/empty/multi-in-order-14 312.0Ki ± 0%
AddExemplar_OutOfOrder/empty/multi-reverse-14 312.0Ki ± 0%
AddExemplar_OutOfOrder/empty/multi-out-of-order-14 352.5Ki ± 0%
AddExemplar_OutOfOrder/empty/in-order-14 39.06Ki ± 0%
AddExemplar_OutOfOrder/full-one/out-of-order-14 39.06Ki ± 0%
AddExemplar_OutOfOrder/full-one/multi-in-order-14 312.0Ki ± 0%
AddExemplar_OutOfOrder/full-one/multi-reverse-14 312.0Ki ± 0%
AddExemplar_OutOfOrder/full-one/multi-out-of-order-14 352.5Ki ± 0%
AddExemplar_OutOfOrder/full-one/in-order-14 39.06Ki ± 0%
AddExemplar_OutOfOrder/full-one/reverse-14 39.06Ki ± 0%
AddExemplar_OutOfOrder/full-multiple/multi-reverse-14 311.8Ki ± 0%
geomean 99.15Ki 112.0Ki +0.00%
│ /tmp/bench_add_ex_old │ /tmp/bench_add_ex_new2 │
│ allocs/op │ allocs/op vs base │
AddExemplar/10000/1000-14 499.0 ± 0% 499.0 ± 0% ~ (p=1.000 n=6) ¹
AddExemplar/100000/1000-14 4.999k ± 0% 4.999k ± 0% ~ (p=1.000 n=6) ¹
AddExemplar/1000000/1000-14 50.00k ± 0% 50.00k ± 0% ~ (p=1.000 n=6) ¹
AddExemplar/10000/10000-14 499.0 ± 0% 499.0 ± 0% ~ (p=1.000 n=6) ¹
AddExemplar/100000/10000-14 4.999k ± 0% 4.999k ± 0% ~ (p=1.000 n=6) ¹
AddExemplar/1000000/10000-14 50.00k ± 0% 50.00k ± 0% ~ (p=1.000 n=6)
AddExemplar/10000/100000-14 499.0 ± 0% 499.0 ± 0% ~ (p=1.000 n=6) ¹
AddExemplar/100000/100000-14 4.999k ± 0% 4.999k ± 0% ~ (p=1.000 n=6) ¹
AddExemplar/1000000/100000-14 50.00k ± 0% 50.00k ± 0% ~ (p=0.242 n=6)
AddExemplar_OutOfOrder/empty/reverse-14 5.000k ± 0%
AddExemplar_OutOfOrder/empty/out-of-order-14 5.000k ± 0%
AddExemplar_OutOfOrder/empty/multi-in-order-14 19.90k ± 0%
AddExemplar_OutOfOrder/empty/multi-reverse-14 19.91k ± 0%
AddExemplar_OutOfOrder/empty/multi-out-of-order-14 21.67k ± 0%
AddExemplar_OutOfOrder/empty/in-order-14 5.000k ± 0%
AddExemplar_OutOfOrder/full-one/out-of-order-14 5.000k ± 0%
AddExemplar_OutOfOrder/full-one/multi-in-order-14 19.90k ± 0%
AddExemplar_OutOfOrder/full-one/multi-reverse-14 19.91k ± 0%
AddExemplar_OutOfOrder/full-one/multi-out-of-order-14 21.67k ± 0%
AddExemplar_OutOfOrder/full-one/in-order-14 5.000k ± 0%
AddExemplar_OutOfOrder/full-one/reverse-14 5.000k ± 0%
AddExemplar_OutOfOrder/full-multiple/multi-reverse-14 19.90k ± 0%
geomean 4.996k 7.818k -0.00%
¹ all samples are equal
Which issue(s) does the PR fix:
Closes https://github.com/prometheus/prometheus/issues/13577
Does this PR introduce a user-facing change?
[FEATURE] Circular exemplar buffer supports out-of-order exemplars.
@dimitarvdimitrov I consolidated the shrink and grow functions around copyExemplarRanges which handles copying exemplars and applying shifts.
Summary of resize operation on the circular buffer (_ indicating empty entries):
Growing:
-
aaa<next>bbbbb->bbbbbaaa<next>____
Shrinking:
-
aaa<next>____<next+diff>->aaa<next>__(if there is enough room to shrink) -
aaa<next>bbb<next+diff>cc->ccaaa<next>__(if(next + diff) < l) -
aaa<next+diff>bb<next>ccc->bb<next>__(else)
/prombench v3.7.3
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️
Compared versions: PR-17469 and v3.7.3
After the successful deployment (check status here), the benchmarking results can be viewed at:
Available Commands:
- To restart benchmark:
/prombench restart v3.7.3 - To stop benchmark:
/prombench cancel - To print help:
/prombench help
/prombench cancel
Benchmark cancel is in progress.