dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

AttributeError when using Sampling Rules Tags as `global_tags` with `ddtrace.opentracer.tracer.Tracer`

Open ndwhelan opened this issue 3 years ago • 13 comments

When using the trace sampling rules tags, as shown in the Datadog APM docs, as global tags, an AttributeError is thrown when creating a span.

Which version of dd-trace-py are you using?

0.47.0

Which version of the libraries are you using?

ddtrace==0.47.0
intervaltree==3.1.0
opentracing==2.4.0
protobuf==3.15.6
six==1.15.0
sortedcontainers==2.3.0
tenacity==7.0.0
wrapt==1.12.1

How can we reproduce your problem?

import ddtrace
from ddtrace.opentracer import Tracer as DatadogOpenTracer
from ddtrace.opentracer.settings import ConfigKeys

config = {
    # The value is ignored for this tag. See
    # https://github.com/DataDog/dd-trace-py/blob/v0.47.0/ddtrace/span.py#L245
    ConfigKeys.GLOBAL_TAGS: {MANUAL_KEEP_KEY: None},
}
tracer = DatadogOpenTracer(service_name, config=config, dd_tracer=ddtrace.tracer,)
tracer.start_active_span(operation_name)

What is the result that you get?

Traceback (most recent call last):
  File "test-module/test.py", line 25, in test
  File "/.pyenv/versions/test-virtualenv/lib/python3.6/site-packages/ddtrace/opentracer/tracer.py", line 164, in start_active_span
    ignore_active_span=ignore_active_span,
  File "/.pyenv/versions/test-virtualenv/lib/python3.6/site-packages/ddtrace/opentracer/tracer.py", line 266, in start_span
    service=self._service_name,
  File "/.pyenv/versions/test-virtualenv/lib/python3.6/site-packages/ddtrace/tracer.py", line 497, in start_span
    span.set_tags(self.tags)
  File "/.pyenv/versions/test-virtualenv/lib/python3.6/site-packages/ddtrace/span.py", line 290, in set_tags
    self.set_tag(k, v)
  File "/.pyenv/versions/test-virtualenv/lib/python3.6/site-packages/ddtrace/span.py", line 246, in set_tag
    self.context.sampling_priority = priority.USER_KEEP
AttributeError: 'NoneType' object has no attribute 'sampling_priority'

What is the result that you expected?

A span is successfully created with the sampling priority correctly set. Tentatively, I believe this can be fixed with by passing the datadog context to the span, before the ddtrace and the opentracing span and their contexts are merged. That can be done by passing in context=context when creating a ddtrace.span.Span at https://github.com/DataDog/dd-trace-py/blob/v0.47.0/ddtrace/tracer.py#L433 and https://github.com/DataDog/dd-trace-py/blob/v0.47.0/ddtrace/tracer.py#L451

ndwhelan avatar Mar 25 '21 00:03 ndwhelan

Something additional that I found that makes me wonder if I'm prepared to make a PR. It's not clear how the maintainers would prefer this...

I've since tried a new approach, having discovered some special tag handling on the OpenTracing side in ddtrace.opentracer.span.Span.set_tag

import ddtrace
from ddtrace.ext import priority
from ddtrace.opentracer import Tracer as DatadogOpenTracer
from ddtrace.opentracer.tags import Tags
from ddtrace.opentracer.settings import ConfigKeys

config = {
    ConfigKeys.GLOBAL_TAGS: {Tags.SAMPLING_PRIORITY: priority.USER_KEEP},
}
tracer = DatadogOpenTracer(service_name, config=config, dd_tracer=ddtrace.tracer,)
tracer.start_active_span(operation_name)

However, it doesn't appear that global tags are ever applied to the opentracing span. Which is to say, global tags don't come in to play at https://github.com/DataDog/dd-trace-py/blob/v0.47.0/ddtrace/opentracer/tracer.py#L280

ndwhelan avatar Mar 25 '21 01:03 ndwhelan

General workaround:

Use a callback to call span.set_tag(Tags.SAMPLING_PRIORITY, priority.USER_KEEP) when initializing the OpenTracing integration. Both Django and Flask's libraries under opentracing-contrib/ has this feature.

Also, perhaps some info on OpenTracing: The reason that the tag here is different from my first example is that it's an OpenTracing best practice to support that tag.

ndwhelan avatar Mar 25 '21 03:03 ndwhelan

hey @ndwhelan thank you for reaching out!

Do you mind me asking your use case to configure a global tag for sampling priority/manual keep?

brettlangdon avatar Mar 25 '21 13:03 brettlangdon

@brettlangdon we have a scenario where a sampling rate configured by the clients of a service may be inadequate for the service. The idea is, setting priority manually should give us a partial trace starting at the service even if the caller's sampling logic decided not to sample the trace. Please let us know if there's a better way to approach this problem.

deadok22 avatar Mar 25 '21 13:03 deadok22

@deadok22 got it, to reiterate to ensure I am understanding correctly.

You have:

Service A -> Service B

Service A is controlling the sampling decision, but you want Service B to send more traces, even if that means you have Service B traces missing the Service A portion in the flamegraph.

Is this accurate?

brettlangdon avatar Mar 25 '21 13:03 brettlangdon

Yes, that's exactly what we're after

deadok22 avatar Mar 25 '21 14:03 deadok22

@deadok22 thank you for confirming. Let me check with some people internally, I think we have alternative methods we are suggesting for people, but I need to make sure I have my information correct first.

brettlangdon avatar Mar 25 '21 16:03 brettlangdon

@deadok22 sorry for the delay getting back to you. Would you be able to expand a bit more on why you want to manually adjust the sampling decision for "Service B" (in our example)?

Is your concern around having more flamegraphs in our UI to inspect, around the impact of sampling on our trace.* metrics, or something else entirely?

brettlangdon avatar Mar 26 '21 22:03 brettlangdon

Hey Brett. If it's not clear, Sergey and I work together, but in very different time zones. I can at least partially answer this.

It's something else entirely.

We have sampling rules for core shared infrastructure (service A) with sampling rates that make sense for the amount of traffic it sees. Just downstream, we have services that are incredibly sparsely called, and we really want APM data for each request to this service (service B). Actually, consider myriad service b through z. We want individual teams and developers to determine different sampling rates for services b through z based on their needs. 100% for b, 50% for c, 65% for d, and so on, nondeterministically. That desire is unachievable if service a is in front of service b, c, d,...,z.

We opened a support ticket through Datadog's Zendesk support site. I think it's best we continue any conversations on more details or why this is the case there, as it may reveal internal details inappropriate for this medium.

I hope this helps!

Best,

Nick

ndwhelan avatar Mar 27 '21 00:03 ndwhelan

@ndwhelan and @deadok22 sorry for the long delay, this is something we have been discussing internally. We don't have a clear solution for this right now, but we are still discussing how this could be done either short term (like you are attempting now) or if there is a longer term solution/timeline.

brettlangdon avatar Mar 31 '21 13:03 brettlangdon

@brettlangdon I just wanted to share that using the workaround with a callback has been working for my team.

ndwhelan avatar Apr 06 '21 00:04 ndwhelan

@ndwhelan thanks for sharing the update. I am going to keep this issue open since it is a legitimate use case that we don't have a great solution for yet.

brettlangdon avatar Apr 06 '21 13:04 brettlangdon

hey @ndwhelan and @deadok22, sorry for the long delay getting back to you, we have been discussing this internally to determine the best option.

Today we don't have any alternative solution other than to set MANUAL_KEEP (USER_KEEP) on any downstream services that you wish to override sampling decision on.

We have gotten similar use cases from other customers so we intend to brainstorm alternative options and find a better path forward in the future.

brettlangdon avatar Apr 19 '21 14:04 brettlangdon

Hey @brettlangdon, any updates here?

deadok22 avatar Mar 07 '23 01:03 deadok22

Manually doing what the Github Action was supposed to do (fixed in https://github.com/DataDog/dd-trace-py/pull/7835):

This issue has been automatically closed after six months of inactivity. If it's a feature request, it has been added to the maintainers' internal backlog and will be included in an upcoming round of feature prioritization. Please comment or reopen if you think this issue was closed in error.

emmettbutler avatar Dec 04 '23 18:12 emmettbutler