dd-opentracing-cpp icon indicating copy to clipboard operation
dd-opentracing-cpp copied to clipboard

multi-service sampling rules for nginx ingress controller

Open marcportabellaclotet-mt opened this issue 2 years ago • 14 comments

I have been trying to define different SAMPLING RULES per k8s nginx ingress, but I had no success.

What I have tried: Define sampling rules according to datadog docs. My config:

- name: DD_TRACE_SAMPLING_RULES
  value: '[{"service": "nginx","sample_rate": 0.2},{"name": "nginxdebug","sample_rate": 1}]'

Changing this value (service:nginx) works for all ingress, but we can not leverage of fine grained service rules as described here.

I have added annotations to my ingress to override the servicename.

nginx.ingress.kubernetes.io/configuration-snippet: |
   opentracing_tag service.name "nginxdebug";

Is this setup possible? Thanks

marcportabellaclotet-mt avatar Dec 08 '22 01:12 marcportabellaclotet-mt

I was able to configure different rules, by setting:

nginx.ingress.kubernetes.io/server-snippet: |
   opentracing_load_tracer /usr/local/lib/libdd_opentracing.so /etc/nginx/opentracing-debug.json;

And creating a new file in the ingress controller:

  "service": "nginxdebug",
  "agent_host": "datadog.datadog.svc.cluster.local",
  "agent_port": 8126,
  "environment": "prod",
  "operation_name_override": "nginx.handle",
  "sample_rate": 1,
  "dd.priority.sampling": true
}

marcportabellaclotet-mt avatar Dec 08 '22 01:12 marcportabellaclotet-mt

Could you please share a nginx setup where DD_TRACE_SAMPLING_RULES is used with different services? I am not sure if I understand how DD_TRACE_SAMPLING_RULES work when multiple services are used. Is DD_TRACE_SAMPLING_RULES able to read different services configurations when opentracing_tag "service.name" is set in nginx config?

marcportabellaclotet-mt avatar Dec 08 '22 08:12 marcportabellaclotet-mt

ingress nginx is using a not up to date version of open-tracing-cpp. (v0.19.0). Can this be the reason why SAMPLING_RULES are not working?

marcportabellaclotet-mt avatar Dec 08 '22 11:12 marcportabellaclotet-mt

Hi, Marc.

I don't have an example of using DD_TRACE_SAMPLING_RULES (or the equivalent dd-config.json) to configure sampling rules for multiple services in the nginx ingress controller for kubernetes.

It would be a nice example to have, though.

I looked through our code based on what you are trying to do, and noticed a few things:

  • The ability to set the trace's service by setting the "service.name" tag was added in dd-opentracing-cpp version v1.3.2. If you share your ingress-nginx version, then I can check the corresponding dd-opentracing-cpp version.
  • I think that your original technique, with the environment variable, should work, but I have not tested it.
    • Your second technique might not work because it could cause the plugin to be loaded more than once, which is not supported.

The first thing to do is get details about the software versions that you're using.

If based on that it's "supposed to work," then I can create a reproduction on a test kubernetes cluster. However, I can't promise I would be able to work on that anytime soon.

dgoffredo avatar Dec 08 '22 12:12 dgoffredo

Thanks for the fast response. I confirm that I am using the latest nginx ingress version for helm chart 4.4.0, which is using DATADOG_CPP_VERSION=1.3.2. I complied a new version of rootfs for ingress controller, using version 1.3.6, and after testing it, the same behavior described in the above messages, using opentracing_tag service.name xxx does not work.

I have run a nginx -s reload, to verify what version is being used, and the configuration for dd tracer:

nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /etc/nginx/nginx.conf:150
info: DATADOG TRACER CONFIGURATION - {"agent_url":"http://datadog.datadog.svc.cluster.local:8126","analytics_enabled":false,"analytics_sample_rate":null,"date":"2022-12-08T22:47:58+0000","enabled":true,"env":"prod","lang":"cpp","lang_version":"201402","operation_name_override":"nginx.handle","report_hostname":false,"sampling_rules":"[{\"service\": \"nginx\",\"sample_rate\": 0.2},{\"service\": \"nginxdebug\",\"sample_rate\": 1}]","service":"nginx","version":"v1.3.6"}

Thanks for your time, and it would be great if you could do some testing on kubernetes, whenever you have time. In the other hand, seems that nginx ingress project is moving to use opentelemetry soon.

marcportabellaclotet-mt avatar Dec 08 '22 23:12 marcportabellaclotet-mt

Is it possible that the DD_TRACING_SAMPLE_RULES is checking the root service name? Settting service.name as opentracing tag is overriding the child span? Looking into the flame graph, when using service.name tag:

image

I also have tested using "opentracing_trace_locations off;" which merges nginx services into the same span.

image

marcportabellaclotet-mt avatar Dec 08 '22 23:12 marcportabellaclotet-mt

Ooo, look at that.

Yes, opentracing_tag service.name will set the service name on the "current span," which if opentracing_trace_locations on means the location span. The outer request span will be unaffected. It might depend on where in the nginx configuration opentracing_tag is being called. In a location block it will certainly refer to the location span. Perhaps in the enclosing server block it would instead refer to the request span, I'm not sure.

In your second screenshot, there is only the nginxdebug service, which is promising. If you are able to go about configuring the ingress controller that way (opentracing_trace_locations off), maybe the sample rate for nginxdebug will be as you specified in the sampling rules.

In the other hand, seems that nginx ingress project is moving to use opentelemetry soon.

We have yet to decide how we'll continue supporting the ingress controller: OpenTelemetry-only, Datadog-specific module, etc. For now we continue to maintain this OpenTracing-based plugin.

dgoffredo avatar Dec 09 '22 18:12 dgoffredo

Might as well keep this open. It's something I'd like to support.

dgoffredo avatar Dec 09 '22 18:12 dgoffredo

I have been testing to define opentracing_tag service.name to the server level instead of location, and it does not help.

nginx.ingress.kubernetes.io/server-snippet: |
  opentracing_tag service.name "nginxdebug";

Also using opentracing_trace_locations off; does not help.

I was trying also another approach, and use name instead of service in sampling rules. However, I could not find a way to override the value defined in operation_name_override, either using:

        opentracing_tag resource.name "debug";
        opentracing_operation_name "debug";
        opentracing_localtion_operation_name "debug";

do not change the root resource name (still the one defined in operation_name_override or it's default nginx.handle):

image

Thanks !

marcportabellaclotet-mt avatar Dec 10 '22 00:12 marcportabellaclotet-mt

Is there any update on this topic?

marcportabellaclotet-mt avatar Mar 11 '23 22:03 marcportabellaclotet-mt

None, I'm afraid. My time has been spent on getting the new tracing library into Envoy and on other internal designs.

Work is planned to add the new tracing library to nginx-datadog, and no later than then I'll revisit this idea of setting sampling configuration in a more fine-grained way.

Sorry for the delay.

dgoffredo avatar Mar 13 '23 19:03 dgoffredo

Thanks for the update

marcportabellaclotet-mt avatar Mar 13 '23 19:03 marcportabellaclotet-mt

I should also point out that getting the new code into the ingress controller is a separate project, also planned, but that will take longer.

Ideally we'd figure out a way to configure your existing system to do what you want, but I haven't spent the time yet this year.

dgoffredo avatar Mar 13 '23 19:03 dgoffredo

That would be great. Thanks again!

marcportabellaclotet-mt avatar Mar 14 '23 07:03 marcportabellaclotet-mt