opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard
[WIP][exporter/elasticsearch] Use exporterhelper/batchsender for reliability
Description: <Describe what has changed.>
WIP.
Blocked by:
- batch by byte in https://github.com/open-telemetry/opentelemetry-collector/issues/8122
- batch by byte is an existing capability in elasticsearch exporter. Not supporting it will be a breaking change to remove a feature.
- https://github.com/open-telemetry/opentelemetry-collector/issues/9952
- https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32359
Link to tracking Issue: Fixes #32377
Testing: <Describe what testing was performed and which tests were added.>
Documentation: <Describe the documentation added.>
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Expecting a lot of README merge conflicts with #33367
Still needs a bit of work to make the benchmarks work because benchmark measure generated count instead of exported doc count. Since sending_queue is enabled, the rate of Consume* will effectively be inf, until sending queue is full.
@carsonip should we disable the queue for benchmarks, so they're focused just on the ES exporter perf?
@carsonip should we disable the queue for benchmarks, so they're focused just on the ES exporter perf?
I believe it will still break the benchmarks because batch sender will block the Consume* call, and that the bench is designed to call Consume* in serial instead of in parallel, resulting in only 1 request in each batch flushed every flush_interval (batcher.flush_timeout). Either way the benchmarks will need to be updated.
benchmark measure generated count instead of exported doc count
IIRC, I did this because I wanted to not decode the bulk request on ES side polluting the benchmark profiling. The benchmarks assumes that all other tests pass (i.e. the exporter is in working condition) and calculates events/s using the total events generated and the total bench time.
the rate of Consume* will effectively be inf, until sending queue is full.
Will it be possible to monitor the state of the queue? If we are able to wait till the queue is empty at the end of the benchmark then we should be golden.
should we disable the queue for benchmarks, so they're focused just on the ES exporter perf?
+1 on this. I think we should have the benchmarks for both cases (with and without the queue) given we can make the queue benchmark report correct numbers by waiting for the queue to be empty as a signal to end the benchmarks.
Benchmark result compared to main with #33621 :
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter/integrationtest
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
│ main.out │ batchsender.out │
│ sec/op │ sec/op vs base │
Exporter/logs/none/small_batch-16 31.65µ ± 1% 67.06µ ± 31% +111.89% (p=0.002 n=6)
Exporter/logs/none/medium_batch-16 309.0µ ± 5% 670.9µ ± 57% +117.11% (p=0.002 n=6)
Exporter/logs/none/large_batch-16 3.012m ± 6% 10.727m ± 2% +256.17% (p=0.002 n=6)
Exporter/logs/none/xlarge_batch-16 34.74m ± 2% 107.16m ± 1% +208.47% (p=0.002 n=6)
Exporter/logs/ecs/small_batch-16 28.29µ ± 3% 47.77µ ± 48% +68.89% (p=0.002 n=6)
Exporter/logs/ecs/medium_batch-16 278.8µ ± 11% 824.9µ ± 43% +195.86% (p=0.002 n=6)
Exporter/logs/ecs/large_batch-16 2.811m ± 7% 8.381m ± 2% +198.11% (p=0.002 n=6)
Exporter/logs/ecs/xlarge_batch-16 33.28m ± 3% 82.66m ± 3% +148.39% (p=0.002 n=6)
Exporter/logs/raw/small_batch-16 32.07µ ± 6% 61.46µ ± 18% +91.62% (p=0.002 n=6)
Exporter/logs/raw/medium_batch-16 306.6µ ± 4% 971.4µ ± 45% +216.82% (p=0.002 n=6)
Exporter/logs/raw/large_batch-16 2.987m ± 5% 9.450m ± 3% +216.43% (p=0.002 n=6)
Exporter/logs/raw/xlarge_batch-16 34.73m ± 2% 95.05m ± 2% +173.73% (p=0.002 n=6)
Exporter/traces/none/small_batch-16 31.44µ ± 8% 69.13µ ± 37% +119.91% (p=0.002 n=6)
Exporter/traces/none/medium_batch-16 316.8µ ± 10% 689.0µ ± 63% +117.46% (p=0.002 n=6)
Exporter/traces/none/large_batch-16 3.181m ± 5% 11.444m ± 2% +259.81% (p=0.002 n=6)
Exporter/traces/none/xlarge_batch-16 33.99m ± 3% 113.22m ± 4% +233.06% (p=0.002 n=6)
Exporter/traces/ecs/small_batch-16 31.39µ ± 3% 67.03µ ± 22% +113.56% (p=0.002 n=6)
Exporter/traces/ecs/medium_batch-16 299.5µ ± 6% 687.8µ ± 55% +129.61% (p=0.002 n=6)
Exporter/traces/ecs/large_batch-16 3.054m ± 3% 11.368m ± 2% +272.20% (p=0.002 n=6)
Exporter/traces/ecs/xlarge_batch-16 33.63m ± 2% 112.89m ± 4% +235.67% (p=0.002 n=6)
Exporter/traces/raw/small_batch-16 30.31µ ± 2% 62.30µ ± 48% +105.58% (p=0.002 n=6)
Exporter/traces/raw/medium_batch-16 287.2µ ± 3% 620.2µ ± 74% +115.94% (p=0.002 n=6)
Exporter/traces/raw/large_batch-16 3.176m ± 9% 10.662m ± 8% +235.67% (p=0.002 n=6)
Exporter/traces/raw/xlarge_batch-16 34.15m ± 4% 106.13m ± 4% +210.80% (p=0.002 n=6)
geomean 988.4µ 2.631m +166.17%
│ main.out │ batchsender.out │
│ docs/s │ docs/s vs base │
Exporter/logs/none/small_batch-16 316.0k ± 1% 149.1k ± 24% -52.80% (p=0.002 n=6)
Exporter/logs/none/medium_batch-16 323.6k ± 5% 123.9k ± 67% -61.72% (p=0.002 n=6)
Exporter/logs/none/large_batch-16 332.04k ± 6% 55.22k ± 1% -83.37% (p=0.002 n=6)
Exporter/logs/none/xlarge_batch-16 287.87k ± 2% 48.53k ± 8% -83.14% (p=0.002 n=6)
Exporter/logs/ecs/small_batch-16 353.5k ± 3% 209.5k ± 33% -40.74% (p=0.002 n=6)
Exporter/logs/ecs/medium_batch-16 358.69k ± 10% 70.51k ± 135% -80.34% (p=0.002 n=6)
Exporter/logs/ecs/large_batch-16 355.72k ± 6% 58.47k ± 2% -83.56% (p=0.002 n=6)
Exporter/logs/ecs/xlarge_batch-16 300.55k ± 3% 56.49k ± 17% -81.21% (p=0.002 n=6)
Exporter/logs/raw/small_batch-16 312.1k ± 7% 162.7k ± 22% -47.87% (p=0.002 n=6)
Exporter/logs/raw/medium_batch-16 326.14k ± 3% 60.88k ± 160% -81.33% (p=0.002 n=6)
Exporter/logs/raw/large_batch-16 334.88k ± 5% 59.23k ± 45% -82.31% (p=0.002 n=6)
Exporter/logs/raw/xlarge_batch-16 287.97k ± 2% 53.11k ± 2% -81.56% (p=0.002 n=6)
Exporter/traces/none/small_batch-16 318.1k ± 7% 144.7k ± 27% -54.52% (p=0.002 n=6)
Exporter/traces/none/medium_batch-16 315.62k ± 9% 98.08k ± 61% -68.93% (p=0.002 n=6)
Exporter/traces/none/large_batch-16 314.46k ± 5% 42.12k ± 5% -86.60% (p=0.002 n=6)
Exporter/traces/none/xlarge_batch-16 294.19k ± 3% 45.93k ± 7% -84.39% (p=0.002 n=6)
Exporter/traces/ecs/small_batch-16 318.6k ± 3% 149.3k ± 22% -53.14% (p=0.002 n=6)
Exporter/traces/ecs/medium_batch-16 333.84k ± 5% 97.21k ± 39% -70.88% (p=0.002 n=6)
Exporter/traces/ecs/large_batch-16 327.41k ± 3% 41.61k ± 2% -87.29% (p=0.002 n=6)
Exporter/traces/ecs/xlarge_batch-16 297.34k ± 2% 44.76k ± 9% -84.95% (p=0.002 n=6)
Exporter/traces/raw/small_batch-16 330.0k ± 2% 160.5k ± 32% -51.35% (p=0.002 n=6)
Exporter/traces/raw/medium_batch-16 348.2k ± 3% 113.8k ± 61% -67.32% (p=0.002 n=6)
Exporter/traces/raw/large_batch-16 315.00k ± 10% 45.11k ± 8% -85.68% (p=0.002 n=6)
Exporter/traces/raw/xlarge_batch-16 292.86k ± 4% 40.81k ± 4% -86.07% (p=0.002 n=6)
geomean 320.0k 76.98k -75.94%
│ main.out │ batchsender.out │
│ events/s │ events/s vs base │
Exporter/logs/none/small_batch-16 316.0k ± 1% 149.1k ± 24% -52.80% (p=0.002 n=6)
Exporter/logs/none/medium_batch-16 323.6k ± 5% 149.3k ± 36% -53.87% (p=0.002 n=6)
Exporter/logs/none/large_batch-16 332.04k ± 6% 93.23k ± 2% -71.92% (p=0.002 n=6)
Exporter/logs/none/xlarge_batch-16 287.87k ± 2% 93.32k ± 1% -67.58% (p=0.002 n=6)
Exporter/logs/ecs/small_batch-16 353.5k ± 3% 209.5k ± 33% -40.74% (p=0.002 n=6)
Exporter/logs/ecs/medium_batch-16 358.7k ± 10% 121.2k ± 76% -66.20% (p=0.002 n=6)
Exporter/logs/ecs/large_batch-16 355.7k ± 6% 119.3k ± 2% -66.46% (p=0.002 n=6)
Exporter/logs/ecs/xlarge_batch-16 300.6k ± 3% 121.0k ± 3% -59.75% (p=0.002 n=6)
Exporter/logs/raw/small_batch-16 312.1k ± 7% 162.7k ± 22% -47.87% (p=0.002 n=6)
Exporter/logs/raw/medium_batch-16 326.1k ± 3% 102.9k ± 81% -68.44% (p=0.002 n=6)
Exporter/logs/raw/large_batch-16 334.9k ± 5% 105.8k ± 3% -68.40% (p=0.002 n=6)
Exporter/logs/raw/xlarge_batch-16 288.0k ± 2% 105.2k ± 2% -63.47% (p=0.002 n=6)
Exporter/traces/none/small_batch-16 318.1k ± 7% 144.7k ± 27% -54.52% (p=0.002 n=6)
Exporter/traces/none/medium_batch-16 315.6k ± 9% 146.0k ± 39% -53.74% (p=0.002 n=6)
Exporter/traces/none/large_batch-16 314.46k ± 5% 87.38k ± 2% -72.21% (p=0.002 n=6)
Exporter/traces/none/xlarge_batch-16 294.19k ± 3% 88.32k ± 4% -69.98% (p=0.002 n=6)
Exporter/traces/ecs/small_batch-16 318.6k ± 3% 149.3k ± 22% -53.14% (p=0.002 n=6)
Exporter/traces/ecs/medium_batch-16 333.8k ± 5% 145.4k ± 35% -56.45% (p=0.002 n=6)
Exporter/traces/ecs/large_batch-16 327.41k ± 3% 87.97k ± 2% -73.13% (p=0.002 n=6)
Exporter/traces/ecs/xlarge_batch-16 297.34k ± 2% 88.59k ± 4% -70.21% (p=0.002 n=6)
Exporter/traces/raw/small_batch-16 330.0k ± 2% 160.5k ± 32% -51.35% (p=0.002 n=6)
Exporter/traces/raw/medium_batch-16 348.2k ± 3% 161.3k ± 42% -53.69% (p=0.002 n=6)
Exporter/traces/raw/large_batch-16 315.00k ± 10% 93.79k ± 9% -70.22% (p=0.002 n=6)
Exporter/traces/raw/xlarge_batch-16 292.86k ± 4% 94.24k ± 4% -67.82% (p=0.002 n=6)
geomean 320.0k 120.3k -62.42%
│ main.out │ batchsender.out │
│ B/op │ B/op vs base │
Exporter/logs/none/small_batch-16 61.69Ki ± 0% 71.60Ki ± 1% +16.07% (p=0.002 n=6)
Exporter/logs/none/medium_batch-16 610.5Ki ± 0% 728.1Ki ± 4% +19.26% (p=0.002 n=6)
Exporter/logs/none/large_batch-16 5.974Mi ± 0% 7.079Mi ± 0% +18.50% (p=0.002 n=6)
Exporter/logs/none/xlarge_batch-16 59.48Mi ± 0% 69.75Mi ± 1% +17.28% (p=0.002 n=6)
Exporter/logs/ecs/small_batch-16 53.24Ki ± 0% 61.58Ki ± 2% +15.65% (p=0.002 n=6)
Exporter/logs/ecs/medium_batch-16 526.2Ki ± 0% 642.3Ki ± 4% +22.07% (p=0.002 n=6)
Exporter/logs/ecs/large_batch-16 5.135Mi ± 0% 6.440Mi ± 0% +25.42% (p=0.002 n=6)
Exporter/logs/ecs/xlarge_batch-16 51.26Mi ± 0% 63.91Mi ± 7% +24.67% (p=0.002 n=6)
Exporter/logs/raw/small_batch-16 60.29Ki ± 0% 69.71Ki ± 2% +15.62% (p=0.002 n=6)
Exporter/logs/raw/medium_batch-16 598.1Ki ± 0% 704.8Ki ± 3% +17.84% (p=0.002 n=6)
Exporter/logs/raw/large_batch-16 5.849Mi ± 0% 6.836Mi ± 2% +16.87% (p=0.002 n=6)
Exporter/logs/raw/xlarge_batch-16 58.32Mi ± 0% 67.95Mi ± 0% +16.50% (p=0.002 n=6)
Exporter/traces/none/small_batch-16 63.12Ki ± 0% 77.48Ki ± 2% +22.75% (p=0.002 n=6)
Exporter/traces/none/medium_batch-16 625.5Ki ± 0% 786.8Ki ± 2% +25.80% (p=0.002 n=6)
Exporter/traces/none/large_batch-16 6.120Mi ± 0% 7.693Mi ± 1% +25.71% (p=0.002 n=6)
Exporter/traces/none/xlarge_batch-16 61.18Mi ± 0% 78.36Mi ± 1% +28.07% (p=0.002 n=6)
Exporter/traces/ecs/small_batch-16 63.16Ki ± 0% 76.90Ki ± 2% +21.75% (p=0.002 n=6)
Exporter/traces/ecs/medium_batch-16 625.5Ki ± 0% 796.8Ki ± 4% +27.38% (p=0.002 n=6)
Exporter/traces/ecs/large_batch-16 6.136Mi ± 0% 7.705Mi ± 0% +25.56% (p=0.002 n=6)
Exporter/traces/ecs/xlarge_batch-16 61.12Mi ± 0% 77.64Mi ± 2% +27.04% (p=0.002 n=6)
Exporter/traces/raw/small_batch-16 62.26Ki ± 1% 75.88Ki ± 3% +21.89% (p=0.002 n=6)
Exporter/traces/raw/medium_batch-16 616.2Ki ± 0% 770.6Ki ± 2% +25.04% (p=0.002 n=6)
Exporter/traces/raw/large_batch-16 6.014Mi ± 0% 7.534Mi ± 2% +25.26% (p=0.002 n=6)
Exporter/traces/raw/xlarge_batch-16 60.22Mi ± 0% 74.45Mi ± 0% +23.62% (p=0.002 n=6)
geomean 1.856Mi 2.261Mi +21.83%
│ main.out │ batchsender.out │
│ allocs/op │ allocs/op vs base │
Exporter/logs/none/small_batch-16 350.0 ± 0% 342.0 ± 0% -2.29% (p=0.002 n=6)
Exporter/logs/none/medium_batch-16 3.417k ± 0% 3.404k ± 0% -0.37% (p=0.002 n=6)
Exporter/logs/none/large_batch-16 34.07k ± 0% 34.03k ± 0% -0.11% (p=0.002 n=6)
Exporter/logs/none/xlarge_batch-16 340.8k ± 0% 340.3k ± 0% -0.14% (p=0.002 n=6)
Exporter/logs/ecs/small_batch-16 190.0 ± 0% 182.0 ± 0% -4.21% (p=0.002 n=6)
Exporter/logs/ecs/medium_batch-16 1.816k ± 0% 1.804k ± 0% -0.66% (p=0.002 n=6)
Exporter/logs/ecs/large_batch-16 18.07k ± 0% 18.03k ± 0% -0.26% (p=0.002 n=6)
Exporter/logs/ecs/xlarge_batch-16 180.8k ± 0% 180.2k ± 0% -0.29% (p=0.002 n=6)
Exporter/logs/raw/small_batch-16 290.0 ± 0% 282.0 ± 0% -2.76% (p=0.002 n=6)
Exporter/logs/raw/medium_batch-16 2.814k ± 0% 2.797k ± 0% -0.60% (p=0.002 n=6)
Exporter/logs/raw/large_batch-16 28.05k ± 0% 27.95k ± 0% -0.33% (p=0.002 n=6)
Exporter/logs/raw/xlarge_batch-16 280.7k ± 0% 280.1k ± 0% -0.21% (p=0.002 n=6)
Exporter/traces/none/small_batch-16 390.0 ± 0% 382.0 ± 0% -2.05% (p=0.002 n=6)
Exporter/traces/none/medium_batch-16 3.817k ± 0% 3.805k ± 0% -0.31% (p=0.002 n=6)
Exporter/traces/none/large_batch-16 38.08k ± 0% 38.04k ± 0% -0.11% (p=0.002 n=6)
Exporter/traces/none/xlarge_batch-16 380.8k ± 0% 380.4k ± 0% -0.10% (p=0.002 n=6)
Exporter/traces/ecs/small_batch-16 390.0 ± 0% 382.0 ± 0% -2.05% (p=0.002 n=6)
Exporter/traces/ecs/medium_batch-16 3.817k ± 0% 3.805k ± 0% -0.31% (p=0.002 n=6)
Exporter/traces/ecs/large_batch-16 38.08k ± 0% 38.04k ± 0% -0.10% (p=0.002 n=6)
Exporter/traces/ecs/xlarge_batch-16 380.8k ± 0% 380.4k ± 0% -0.11% (p=0.002 n=6)
Exporter/traces/raw/small_batch-16 370.0 ± 0% 362.0 ± 0% -2.16% (p=0.002 n=6)
Exporter/traces/raw/medium_batch-16 3.616k ± 0% 3.605k ± 0% -0.30% (p=0.002 n=6)
Exporter/traces/raw/large_batch-16 36.08k ± 0% 36.04k ± 0% -0.13% (p=0.002 n=6)
Exporter/traces/raw/xlarge_batch-16 360.8k ± 0% 360.3k ± 0% -0.14% (p=0.002 n=6)
geomean 9.916k 9.832k -0.84%
Also an important point to note: with this PR the docs/s is not always equal to events/s, which may indicate a bug. Will dig into it later.
Also an important point to note: with this PR the docs/s is not always equal to events/s, which may indicate a bug. Will dig into it later.
There was an uncaught http request body too large due to receiver default max request bytes settings. Fixed by https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32632/commits/4006cd9699ba054a2ee35e80f9e808511a9c6872
Benchmark result compared to main with https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33621 :
I looked into the perf regression and figured out for root cause of it. It is because on main, events are first encoded, then added to bulk indexer for batching. When run with RunParallel in #33621 , events in different Consume* calls are encoded in parallel. On the other hand, with this batch sender PR, events are first batched, then encoded by a single goroutine. This will under-utilize the CPU when GOMAXPROCS>1.
playing devil's advocate: do we want to fully use CPUs in the exporter side, or is there an advantage to tying up only one goroutine on encode? Are there benefits to leaving CPU on the table for other processors etc, or reduced memory usage by consolidating encoding?
sorry ignore the question about reduced memory usage as the bench already has bytes, allocs in it.
playing devil's advocate: do we want to fully use CPUs in the exporter side, or is there an advantage to tying up only one goroutine on encode? Are there benefits to leaving CPU on the table for other processors etc, or reduced memory usage by consolidating encoding?
@codefromthecrypt That's actually a very good question. I could have squeezed every bit of cpu and performance by doing encoding of each log record / span in a separate goroutine in parallel, which would "fix" the "perf regression" mentioned above. But the reality is that the encoding work itself is inefficient, and that is the actual part that needs to be fixed, before we try to mask the inefficiency by parallelizing inefficient encoding work.
That's also why, after discussing with the team, this PR will stay as is, and the perf improvements will come later as follow-up PRs.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.
Superseded by #34238