opentelemetry-java icon indicating copy to clipboard operation
opentelemetry-java copied to clipboard

JaegerRemoteSampler inconsistent Implementation

Open rotscher opened this issue 2 years ago • 6 comments

The implementation of SDK Jaeger Remote Sampler Extension is not according to the description of the example of Jaeger Remote Sampler.

{
  "service_strategies": [
    {
      "service": "foobar",
      "type": "ratelimiting",
      "param": 2
    }
  ],
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.2,
    "operation_strategies": [
      {
        "operation": "/metrics",
        "type": "probabilistic",
        "param": 0.0
      }
    ]
  }
}

Having the remote sampling configuration above I would expect service foobar to be rate limited to 2 and all other services to be sampled with probability 0.2. As an exception operation /metrics of all services (including foobar) never gets sampled.

With the current implementation only rate limiting is configured for service foobar which is also applied to /metrics operation.

Also when operation_strategies are configured for a service then the default type of the service is always "probabilistic" even when "ratelimiting" is configured.

I've used the following libraries and versions:

  • io.opentelemetry:opentelemetry-sdk-extension-autoconfigure:1.26.0-alpha
  • io.opentelemetry:opentelemetry-sdk-extension-jaeger-remote-sampler:1.26.0-alpha

And activated the autoconfiguration with export OTEL_TRACES_SAMPLER=parentbased_jaeger_remote

rotscher avatar Jun 05 '23 14:06 rotscher

@pavolloffay can you comment on this?

jack-berg avatar Jun 05 '23 15:06 jack-berg

@jack-berg @jkwatson - I would like to try solving this if thats ok :)

kuujis avatar Mar 09 '24 09:03 kuujis

correct it should be fixed in OTEL SDK

pavolloffay avatar Mar 11 '24 10:03 pavolloffay

correct it should be fixed in OTEL SDK

I was debugging this over the weekend and to me it seems the issue is with jaeger not returning probablistic operation sampling strategies if the service configuration is ratelimiting. In theory it would be possible to call Jaeger twice, get strategies for a given service, then get defaults and merge them before starting to use them, but to me it looks like a mistake in Jaeger backend and this "calling jaeger twice" does not feel right.

kuujis avatar Mar 12 '24 12:03 kuujis

Just a small update - after updating jaeger to return the operation level default probablistic sampling strategies for ratelimiting service level sampling strategy configurations - the issue can't be reproduced anymore. So I assume this could be closed once the PR to jaeger gets accepted OR if it does not - I'll make a PR to otel with double jaeger call and merge of the results (I still feel a bit dirty about this option though)

kuujis avatar Mar 15 '24 15:03 kuujis

This should work out of the box with next release of Jaeger, provided flag sampling.strategies.bugfix-5270 is used when launching it. Since the fix might result in unexpected impact to the Jaeger users - this will become default behavior only in 1.57 version. See https://github.com/jaegertracing/jaeger/issues/5270 for more details.

kuujis avatar Apr 20 '24 19:04 kuujis