opentelemetry-java-contrib icon indicating copy to clipboard operation
opentelemetry-java-contrib copied to clipboard

[WIP] Common Expression Language (CEL) sampler

Open dol opened this issue 6 months ago • 6 comments

This pull request introduces a new CelBasedSampler to the OpenTelemetry Java SDK, enabling advanced sampling rules using the Common Expression Language (CEL). It also includes updates to the existing RuleBasedRoutingSampler for consistency and clarity. Below is a summary of the most important changes grouped by theme:

New CelBasedSampler Implementation:

  • Added the CelBasedSampler class, which evaluates CEL expressions to make sampling decisions based on span attributes. It includes a fallback sampler and supports multiple expressions. (samplers/src/main/java/io/opentelemetry/contrib/sampler/CelBasedSampler.java)
  • Introduced the CelBasedSamplerBuilder to construct CelBasedSampler instances with methods for adding expressions and specifying actions (e.g., DROP or RECORD_AND_SAMPLE). (samplers/src/main/java/io/opentelemetry/contrib/sampler/CelBasedSamplerBuilder.java)
  • Added the CelBasedSamplingExpression class to encapsulate individual CEL expressions and their associated samplers. (samplers/src/main/java/io/opentelemetry/contrib/sampler/CelBasedSamplingExpression.java)
  • Implemented a declarative configuration provider for CelBasedSampler, enabling configuration through YAML files. (samplers/src/main/java/io/opentelemetry/contrib/sampler/internal/CelBasedSamplerComponentProvider.java)
# The fallback sampler to use if no expressions match.
fallback_sampler:
  always_on:
# List of CEL expressions to evaluate. Expressions are evaluated in order.
expressions:
  # The action to take when the expression evaluates to true. Must be one of: DROP, RECORD_AND_SAMPLE.
  - action: DROP
    # The CEL expression to evaluate. Must return a boolean.
    expression: attribute['url.path'].startsWith('/actuator')
  - action: RECORD_AND_SAMPLE
    expression: attribute['http.method'] == 'GET' && attribute['http.status_code'] < 400

Documentation Updates:

  • Updated samplers/README.md to document the new CelBasedSampler, including its usage, schema, and example configurations. (samplers/README.md) [1] [2]

Dependency Additions:

  • Added a dependency on the dev.cel:cel:0.9.0 library to enable CEL expression evaluation. (samplers/build.gradle.kts)

Updates to RuleBasedRoutingSampler:

  • Renamed SamplingRule to RuleBasedRoutingSamplingRule for clarity and updated all related references in the RuleBasedRoutingSampler and its builder. (samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSampler.java, samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSamplerBuilder.java, samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSamplingRule.java) [1] [2] [3] [4] [5] [6] [7]

These changes collectively enhance the SDK's sampling capabilities, allowing users to define sophisticated sampling rules using CEL while maintaining compatibility with existing samplers.

dol avatar Jun 11 '25 15:06 dol

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: dol / name: Dominic Lüchinger (55c88a8bf64f9f18e6fbfda6587e206b74cf95f0, c863fe6fc237c2276789681bd2e216b204a14b02)

@dol this looks nice!

@johnbley what do you think of CEL vs JEXL in the context of https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/13590?

trask avatar Jun 11 '25 17:06 trask

Just a note to myself. I need to expand the test cases with the following addition.

  • Define multiple expressions to test the handling and ordering of expressions
    • Add the same expression with DROP and RECORD_AND_SAMPLE. The first expression and action should be take affect.
    • Add more complex expressions that pass the compile step and should work in the verify step like attribute['http.status_code'] < 400 and time based example to sample outside of business hours

dol avatar Jun 12 '25 07:06 dol

@dol this looks nice!

@johnbley what do you think of CEL vs JEXL in the context of open-telemetry/opentelemetry-java-instrumentation#13590?

Very interesting! I'll take a look at it!

johnbley avatar Jun 12 '25 12:06 johnbley

Just a note to myself. I need to expand the test cases with the following addition.

* Define multiple expressions to test the handling and ordering of expressions
  
  * Add the same expression with DROP and RECORD_AND_SAMPLE. The first expression and action should be take affect.
  * Add more complex expressions that pass the compile step and should work in the verify step like `attribute['http.status_code'] < 400` and time based example to sample outside of business hours

As I refined the tests I found some edge cases in the setup and mapping of the CEL engine. I'll mark the PR as work in progress. Sorry for that.

dol avatar Jun 12 '25 13:06 dol

I'll mark the PR as work in progress. Sorry for that.

no worries! you can click "Convert to draft" (hidden under the list of reviewers), and then when you're ready you can click "Ready for review"

trask avatar Jun 12 '25 14:06 trask

@trask @jack-berg : I wanted you to give an update on the PR. As I was adding more tests cases to PR I encountered a problem with single/double quote parsing on the declarative config. First I though it's an issue with a complex expression input. But after digging deep into the source code of this project and opentelementry-java I figured out, that the https://opentelemetry.io/docs/specs/otel/configuration/data-model/#environment-variable-substitution implementation on top of snakeyaml can not handle a mix of single and double quotes very well.

I created the following bug report: https://github.com/open-telemetry/opentelemetry-java/issues/7429

As the chances are very high that a complex expression will need to mix single and double quotes, this bug should be addressed first.

dol avatar Jun 18 '25 06:06 dol

https://github.com/open-telemetry/opentelemetry-java/pull/7433 should fix this regression.

dol avatar Jun 19 '25 06:06 dol

The latest version added more unit tests and better coverage. I think the new CEL sampler is ready for an review. The tests still require the https://github.com/open-telemetry/opentelemetry-java/pull/7433 to be approved, merged and released. Until then the build and tests will fail.

dol avatar Jun 23 '25 22:06 dol

Build is pretty broken. Can you look at this @dol thanks!

breedx-splk avatar Jul 07 '25 21:07 breedx-splk

The latest version added more unit tests and better coverage. I think the new CEL sampler is ready for an review. The tests still require the open-telemetry/opentelemetry-java#7433 to be approved, merged and released. Until then the build and tests will fail.

@breedx-splk I'm waiting for the v1.52.0 release, which should be released soon. This will fix all the broken tests.

dol avatar Jul 07 '25 22:07 dol

@jack-berg @trask The PR is ready for review after the fix was merged and release ( https://github.com/open-telemetry/opentelemetry-java/pull/7433 ) and the BOM version was bumped ( https://github.com/open-telemetry/opentelemetry-java-contrib/commit/850933fe2b9d69f5a63364474d389a00074a80a2#diff-df7d51fc1db73056c56958a9784e26310dae8ec239fb940820f5ddea4b655693L5 )

dol avatar Aug 06 '25 09:08 dol

@breedx-splk Thank you for the valuable review. I addressed all the remarks. The newest versions returns the detailed message if the CEL compiles is unable to parse the expression.

dol avatar Sep 02 '25 21:09 dol

Sorry for the lag @dol -- it's a busy time. Need you to resolve a few conflicts now, thanks!

breedx-splk avatar Sep 23 '25 23:09 breedx-splk

@breedx-splk no problem. Conflicts are resolved after rebase fails on my side. In addition bumped the dev.cel:cel dependency to the latest release.

dol avatar Sep 25 '25 21:09 dol

@dol looks like there are some merge conflicts now...

breedx-splk avatar Oct 01 '25 16:10 breedx-splk

@dol looks like there are some merge conflicts now...

@breedx-splk Thx for the hint. The merge conflict is resolved.

dol avatar Oct 01 '25 19:10 dol

I've added this PR as a topic for tomorrow's Java SIG meeting to get a few more eyes on it, in particular since this module is automatically pulled in and available in the OpenTelemetry Java agent (@dol you're of course welcome to join if you'd like!)

trask avatar Oct 01 '25 20:10 trask

I'm not against putting this into contrib at all but wonder if a better long-term approach would be to have a Java implementation of an ottl engine (or a large part of one) for this purpose? For instance, the collector's tailsamplingprocessor reuses ottl for its ottl_conditions. If the same syntax, functions, data model, and scoping semconv could be applied at both the language-level and the collector it would be a big win for coherency.

For clarity, I'm raising a possible future idea, not trying to derail the current work here which looks good!

johnbley avatar Oct 01 '25 23:10 johnbley

@breedx-splk Thanks for the clarification on how to go forward. I'll go ahead and perform the requested changes to outsource the CEL sampler to it's own gradle module.

I was not aware that the OTEL collector already has a similar approeach in complex filtering with it's own DSL, the OpenTelemetry Transformation Language (OTTL). I agree that I would make sense to have the one common syntax like the OTTL for processing and sampling.

dol avatar Oct 04 '25 20:10 dol

@breedx-splk It's now it's own dedicated gradle module. I messed up the rebase and it included more commits then expected. As a consequence the github-action bot included a lot more reviewers. 🙇 Sorry for spamming additional code owners.

dol avatar Oct 09 '25 21:10 dol