apm-agent-nodejs icon indicating copy to clipboard operation
apm-agent-nodejs copied to clipboard

feat: 'elasticsearchCaptureBodyUrls' config option

Open trentm opened this issue 3 years ago • 2 comments

This adds a new config option, per the 'elasticsearch_capture_body_urls' cross-agent spec, to control for which ES client requests the request body is captured. By default it is just the body of search-related APIs. The use case for configuring this is to capture for all requests for reply in Kibana performance work. This option is not expected to be used often.

Closes: #2019 Refs: https://github.com/elastic/apm/issues/665 Refs: https://github.com/elastic/apm/pull/670

checklist

  • [x] finish implementation (docs, some tests)
  • [x] CHANGELOG entry
  • [x] TAV tests: https://apm-ci.elastic.co/job/apm-agent-nodejs/job/apm-agent-nodejs-mbp/job/PR-2873/7/
  • [ ] perhaps wait for the spec to be merged

trentm avatar Aug 12 '22 20:08 trentm

:green_heart: Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-08-12T23:15:21.852+0000

  • Duration: 34 min 28 sec

Test stats :test_tube:

Test Results
Failed 0
Passed 260042
Skipped 0
Total 260042

:robot: GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run module tests for <modules> : Run TAV tests for one or more modules, where <modules> can be either a comma separated list of modules (e.g. memcached,redis) or the string literal ALL to test all modules

  • run benchmark tests : Run the benchmark test only.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

apmmachine avatar Aug 12 '22 20:08 apmmachine

run module tests for elasticsearch,@elastic/elasticsearch

trentm avatar Aug 12 '22 23:08 trentm

some premature optimization perf notes

I digressed slightly to sanity check that the change in ES URL path checking here (from a custom pathIsAQuery regexp, to a WildcardMatcher-created array of regexps) will not be a performance problem.

tl;dr: The WildcardMatcher-based comparison is much slower. However, it is on the order of 1 microsecond per path check on my MacBook Pro (2.4 GHz 8-Core Intel Core i9). Given that ES queries are remote calls and on the order of milliseconds, it would generally be premature optimization to bother getting fancy to improve performance of this.

details

Here is my benchmark script: https://gist.github.com/trentm/ba426e5a08317b18d40e897aeb536c0c

And a run:

% node bench-shouldCaptureBody.js
WilcardMatcher array:
	[/^.*\/_search$/i, /^.*\/_search\/template$/i, /^.*\/_msearch$/i, /^.*\/_msearch\/template$/i, /^.*\/_async_search$/i, /^.*\/_count$/i, /^.*\/_sql$/i, /^.*\/_eql\/search$/i]
Current pathIsAQuery RegExp: /\/(_search|_msearch|_count|_async_search|_sql|_eql)(\/|$)/
Naive WildcardMatcher in a single RegExp:
	/^(.*\/_search|.*\/_search\/template|.*\/_msearch|.*\/_msearch\/template|.*\/_async_search|.*\/_count|.*\/_sql|.*\/_eql\/search)$/i
customEquivReIgnoreCase:
	/\/(_search|_search\/template|_msearch|_msearch\/template|_async_search|_count|_sql|_eql\/search)$/i
customEquivRe:
	/\/(_search|_search\/template|_msearch|_msearch\/template|_async_search|_count|_sql|_eql\/search)$/

WildcardMatcher array x 719,208 ops/sec ±0.99% (90 runs sampled)
Current pathIsAQuery RegExp x 15,700,534 ops/sec ±0.97% (90 runs sampled)
Naive WildcardMatcher in a single RegExp x 856,094 ops/sec ±1.10% (88 runs sampled)
customEquivReIgnoreCase x 17,477,871 ops/sec ±1.37% (88 runs sampled)
customEquivRe x 17,942,769 ops/sec ±0.91% (88 runs sampled)
Fastest is customEquivRe
  1. The WildcardMatcher-based array of regexps that this PR changes to is ~20x slower than the current pathIsAQuery RegExp.
  2. A naive potential optimization might be to make a WildcardMatcher array config into a single regexp -- ^(pattern1|pattern2|...)$. This turns out to be not much faster, in this case.
  3. The major perf difference is that the (a) the WildcardMatcher necessarily always has a ^ anchor and (b) every pattern is leading with * to avoid that anchor. A possible optimization of WildcardMatcher usage would be to drop the ^ anchor and the leading * if and only if every pattern had that leading *. An equivalent optimization could be done for the '$' anchor and a trailing * on every pattern.
  4. There is a small perf win with dropping the case-insensitive flag. Arguably the default elasticsearch_capture_body_urls values could/should be case-sensitive. I don't believe the Elasticsearch REST API would accept POST /my-index/_SeArCh.

However, this is all fast enough, so I'm not proposing doing any WildcardMatcher perf improvements right now.

trentm avatar Aug 15 '22 19:08 trentm

@astorm the spec for this will get merged tomorrow, so feel free to review this when you get a chance. Thanks.

trentm avatar Aug 17 '22 20:08 trentm

:green_heart: Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-08-18T23:49:18.790+0000

  • Duration: 21 min 41 sec

Test stats :test_tube:

Test Results
Failed 0
Passed 260042
Skipped 0
Total 260042

:robot: GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run module tests for <modules> : Run TAV tests for one or more modules, where <modules> can be either a comma separated list of modules (e.g. memcached,redis) or the string literal ALL to test all modules

  • run benchmark tests : Run the benchmark test only.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

apmmachine avatar Aug 19 '22 00:08 apmmachine

[email protected] is release with this now

trentm avatar Oct 19 '22 15:10 trentm