[WIP] Common Expression Language (CEL) sampler
This pull request introduces a new CelBasedSampler to the OpenTelemetry Java SDK, enabling advanced sampling rules using the Common Expression Language (CEL). It also includes updates to the existing RuleBasedRoutingSampler for consistency and clarity. Below is a summary of the most important changes grouped by theme:
New CelBasedSampler Implementation:
- Added the
CelBasedSamplerclass, which evaluates CEL expressions to make sampling decisions based on span attributes. It includes a fallback sampler and supports multiple expressions. (samplers/src/main/java/io/opentelemetry/contrib/sampler/CelBasedSampler.java) - Introduced the
CelBasedSamplerBuilderto constructCelBasedSamplerinstances with methods for adding expressions and specifying actions (e.g.,DROPorRECORD_AND_SAMPLE). (samplers/src/main/java/io/opentelemetry/contrib/sampler/CelBasedSamplerBuilder.java) - Added the
CelBasedSamplingExpressionclass to encapsulate individual CEL expressions and their associated samplers. (samplers/src/main/java/io/opentelemetry/contrib/sampler/CelBasedSamplingExpression.java) - Implemented a declarative configuration provider for
CelBasedSampler, enabling configuration through YAML files. (samplers/src/main/java/io/opentelemetry/contrib/sampler/internal/CelBasedSamplerComponentProvider.java)
# The fallback sampler to use if no expressions match.
fallback_sampler:
always_on:
# List of CEL expressions to evaluate. Expressions are evaluated in order.
expressions:
# The action to take when the expression evaluates to true. Must be one of: DROP, RECORD_AND_SAMPLE.
- action: DROP
# The CEL expression to evaluate. Must return a boolean.
expression: attribute['url.path'].startsWith('/actuator')
- action: RECORD_AND_SAMPLE
expression: attribute['http.method'] == 'GET' && attribute['http.status_code'] < 400
Documentation Updates:
- Updated
samplers/README.mdto document the newCelBasedSampler, including its usage, schema, and example configurations. (samplers/README.md) [1] [2]
Dependency Additions:
- Added a dependency on the
dev.cel:cel:0.9.0library to enable CEL expression evaluation. (samplers/build.gradle.kts)
Updates to RuleBasedRoutingSampler:
- Renamed
SamplingRuletoRuleBasedRoutingSamplingRulefor clarity and updated all related references in theRuleBasedRoutingSamplerand its builder. (samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSampler.java,samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSamplerBuilder.java,samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSamplingRule.java) [1] [2] [3] [4] [5] [6] [7]
These changes collectively enhance the SDK's sampling capabilities, allowing users to define sophisticated sampling rules using CEL while maintaining compatibility with existing samplers.
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: dol / name: Dominic Lüchinger (55c88a8bf64f9f18e6fbfda6587e206b74cf95f0, c863fe6fc237c2276789681bd2e216b204a14b02)
@dol this looks nice!
@johnbley what do you think of CEL vs JEXL in the context of https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/13590?
Just a note to myself. I need to expand the test cases with the following addition.
- Define multiple expressions to test the handling and ordering of expressions
- Add the same expression with DROP and RECORD_AND_SAMPLE. The first expression and action should be take affect.
- Add more complex expressions that pass the compile step and should work in the verify step like
attribute['http.status_code'] < 400and time based example to sample outside of business hours
@dol this looks nice!
@johnbley what do you think of CEL vs JEXL in the context of open-telemetry/opentelemetry-java-instrumentation#13590?
Very interesting! I'll take a look at it!
Just a note to myself. I need to expand the test cases with the following addition.
* Define multiple expressions to test the handling and ordering of expressions * Add the same expression with DROP and RECORD_AND_SAMPLE. The first expression and action should be take affect. * Add more complex expressions that pass the compile step and should work in the verify step like `attribute['http.status_code'] < 400` and time based example to sample outside of business hours
As I refined the tests I found some edge cases in the setup and mapping of the CEL engine. I'll mark the PR as work in progress. Sorry for that.
I'll mark the PR as work in progress. Sorry for that.
no worries! you can click "Convert to draft" (hidden under the list of reviewers), and then when you're ready you can click "Ready for review"
@trask @jack-berg : I wanted you to give an update on the PR. As I was adding more tests cases to PR I encountered a problem with single/double quote parsing on the declarative config. First I though it's an issue with a complex expression input. But after digging deep into the source code of this project and opentelementry-java I figured out, that the https://opentelemetry.io/docs/specs/otel/configuration/data-model/#environment-variable-substitution implementation on top of snakeyaml can not handle a mix of single and double quotes very well.
I created the following bug report: https://github.com/open-telemetry/opentelemetry-java/issues/7429
As the chances are very high that a complex expression will need to mix single and double quotes, this bug should be addressed first.
https://github.com/open-telemetry/opentelemetry-java/pull/7433 should fix this regression.
The latest version added more unit tests and better coverage. I think the new CEL sampler is ready for an review. The tests still require the https://github.com/open-telemetry/opentelemetry-java/pull/7433 to be approved, merged and released. Until then the build and tests will fail.
Build is pretty broken. Can you look at this @dol thanks!
The latest version added more unit tests and better coverage. I think the new CEL sampler is ready for an review. The tests still require the open-telemetry/opentelemetry-java#7433 to be approved, merged and released. Until then the build and tests will fail.
@breedx-splk I'm waiting for the v1.52.0 release, which should be released soon. This will fix all the broken tests.
@jack-berg @trask The PR is ready for review after the fix was merged and release ( https://github.com/open-telemetry/opentelemetry-java/pull/7433 ) and the BOM version was bumped ( https://github.com/open-telemetry/opentelemetry-java-contrib/commit/850933fe2b9d69f5a63364474d389a00074a80a2#diff-df7d51fc1db73056c56958a9784e26310dae8ec239fb940820f5ddea4b655693L5 )
@breedx-splk Thank you for the valuable review. I addressed all the remarks. The newest versions returns the detailed message if the CEL compiles is unable to parse the expression.
Sorry for the lag @dol -- it's a busy time. Need you to resolve a few conflicts now, thanks!
@breedx-splk no problem. Conflicts are resolved after rebase fails on my side.
In addition bumped the dev.cel:cel dependency to the latest release.
@dol looks like there are some merge conflicts now...
@dol looks like there are some merge conflicts now...
@breedx-splk Thx for the hint. The merge conflict is resolved.
I've added this PR as a topic for tomorrow's Java SIG meeting to get a few more eyes on it, in particular since this module is automatically pulled in and available in the OpenTelemetry Java agent (@dol you're of course welcome to join if you'd like!)
I'm not against putting this into contrib at all but wonder if a better long-term approach would be to have a Java implementation of an ottl engine (or a large part of one) for this purpose? For instance, the collector's tailsamplingprocessor reuses ottl for its ottl_conditions. If the same syntax, functions, data model, and scoping semconv could be applied at both the language-level and the collector it would be a big win for coherency.
For clarity, I'm raising a possible future idea, not trying to derail the current work here which looks good!
@breedx-splk Thanks for the clarification on how to go forward. I'll go ahead and perform the requested changes to outsource the CEL sampler to it's own gradle module.
I was not aware that the OTEL collector already has a similar approeach in complex filtering with it's own DSL, the OpenTelemetry Transformation Language (OTTL).
I agree that I would make sense to have the one common syntax like the OTTL for processing and sampling.
@breedx-splk It's now it's own dedicated gradle module. I messed up the rebase and it included more commits then expected. As a consequence the github-action bot included a lot more reviewers. 🙇 Sorry for spamming additional code owners.