dd-trace-py
dd-trace-py copied to clipboard
Updated IntegrationConfig object
In order to add service and service name to global config from the global environment variable DD_SERVICE, IntegrationConfig.init has been updated.
Description
Now some libraries ignore service name and use default attribute _default_service for trace.
For instance, lib requests has this problem.
As a result, we see requests instead of the real service name
Let's reproduce this problem
we have these global variables:
DD_SERVICE=senderXXX
DD_ENV=TEST
DD_AGENT_HOST=datadog-agent # working datadog agent
Python code:
import requests
from ddtrace import tracer
from ddtrace import patch_all
patch_all()
with tracer.trace("parent") as span:
r = requests.get("https://google.com")
Now we can resolve this problem with this global variable:
DD_REQUESTS_SERVICE_NAME=senderXXX
In order to avoid this problem, I extended IntegrationConfig.init method.
Checklist
- [ ] Title must conform to conventional commit.
- [ ] Add additional sections for
featandfixpull requests. - [ ] Ensure tests are passing for affected code.
- [ ] Library documentation and/or Datadog's documentation site is updated. Link to doc PR in description.
Motivation
Design
Testing strategy
Relevant issue(s)
Testing strategy
Reviewer Checklist
- [ ] Title is accurate.
- [ ] Description motivates each change.
- [ ] No unnecessary changes were introduced in this PR.
- [ ] PR cannot be broken up into smaller PRs.
- [ ] Avoid breaking API changes unless absolutely necessary.
- [ ] Tests provided or description of manual testing performed is included in the code or PR.
- [ ] Release note has been added for fixes and features, or else
changelog/no-changeloglabel added. - [ ] All relevant GitHub issues are correctly linked.
- [ ] Backports are identified and tagged with Mergifyio.
- [ ] Add to milestone.
hey @dart-neitro if I understand you correctly, you want to have all spans produced from your Python application to share the same service name?
As of right now any integration which represents a call to an external service (http, database, gRPC, etc) purposefully sets a different service name from DD_SERVICE. There are DD_<INTEGRATION>_SERVICE environment variables available to allowing configuring for each integration independently.
Unfortunately we will not be able to accept the change you have proposed at this time since it would be a major breaking change from the existing behavior people rely on. What I can recommend doing instead, if you would be willing to open an issue describing the your desired behavior of the library we can discuss what options we have to move forward.
Hey @brettlangdon,
Yes, you absolutely right. I expect libraries to have the same behavior.
For instance, I've experimented with requests, httpx and aiohttp client and only the requests has this behavior.
In fact, this is weird for me. I expect requests, httpx (sync and async) and aiohttp client would have the same behavior.
I'd understood that it's breaking change after I researched tests that were broken after my PR. I have being continued ddtrace CI/CD process and maybe it's good idea to create demo to demonstrate this problem.
As I wrote early, I found DD_REQUESTS_SERVICE_NAME in source of the code base and solved my issue.
However, I think this information has to be added to docs and make this point more explicit. Now if I try to find something in google by this query: DD_REQUESTS_SERVICE_NAME I get this results:
Your search - DD_REQUESTS_SERVICE_NAME - did not match any documents.
I am going to create demo that demonstrates this issue and open an issue after that.
Thanks for your answer.
hey @dart-neitro, makes sense. Yeah, requests is an older integration so likely that it is different, but still difficult to change since it would be a breaking change.
hmm, if DD_REQUESTS_SERVICE isn't working that is definitely a bug. I cannot see why that would be happening though 🤔
I won't try to claim this code is easy to understand, but this is where we resolve the service name for requests.request spans:
https://github.com/DataDog/dd-trace-py/blob/5b97489e2/ddtrace/contrib/requests/connection.py#L75-L84
By default there is no config stored on the requests.Session instances, so we should always be falling back to trace_utils.ext_service(None, config.requests) which should be picking up the ddtrace.config.requests.service setting (set by DD_REQUESTS_SERVICE)
If you end up with a reproduction that would be great!
hey @brettlangdon, The peace of code that you mentioned early ( https://github.com/DataDog/dd-trace-py/blob/5b97489e2/ddtrace/contrib/requests/connection.py#L75-L84 ) does not work in this case. Why? When you patch module, for instance, requests. You use the patch function from module. https://github.com/DataDog/dd-trace-py/blob/5b97489e2b073fa773819904cdef26981a5a28df/ddtrace/contrib/requests/patch.py#L25-L32
How we add requests to global config:
Initial module import here:
https://github.com/DataDog/dd-trace-py/blob/5b97489e2b073fa773819904cdef26981a5a28df/ddtrace/_monkey.py#L262
Than we execute this: https://github.com/DataDog/dd-trace-py/blob/5b97489e2b073fa773819904cdef26981a5a28df/ddtrace/contrib/requests/patch.py#L15-L22
Than we get the attribute of config: https://github.com/DataDog/dd-trace-py/blob/5b97489e2b073fa773819904cdef26981a5a28df/ddtrace/settings/config.py#L245
The next "breakpoint" is very important:
https://github.com/DataDog/dd-trace-py/blob/5b97489e2b073fa773819904cdef26981a5a28df/ddtrace/settings/config.py#L215
since our IntegrationConfig(self, name) has service_name=None and service=None
Since we get service_name and service from "DD_%s_SERVICE"* or "DD_%s_SERVICE_NAME"* only:
https://github.com/DataDog/dd-trace-py/blob/5b97489e2b073fa773819904cdef26981a5a28df/ddtrace/settings/integration.py#L46-L52
- DD_REQUESTS_SERVICE_NAME or DD_REQUESTS_SERVICE for requests.
As a result, ddtrace.config._config contains correct service attribute (from the global environment DD_SERVICE), however ddtrace.config._config['requests'] contains None as service value.
Let's check it: global environments: DD_SERVICE=senderXXX
from ddtrace import config
from ddtrace import patch
patch(requests=True)
config._config['requests']
assert config._config['requests'].service_name is None
assert config._config['requests'].service is None
assert config.service == "senderXXX"
As a result, this peace of code (https://github.com/DataDog/dd-trace-py/blob/5b97489e2/ddtrace/contrib/requests/connection.py#L75-L84) does not work as expected because of None-s.
PS I am working on demo, I will comment here when I finish it
hey @dart-neitro, thanks for the detailed walk through.
Yes, that makes complete sense why DD_SERVICE does not impact the service name for requests.
Something for me to discuss with the rest of the team, but httpx/aiohttp are newer integrations where we determined the behavior you are describing as the reasonable default. However, since this would be a breaking change to requests we need to figure out how to address.
Some options I am thinking about:
- Add a feature flag that when toggled will fix the behavior for
requests- If it is a
requestsonly feature flag, then it might not be any different from settingDD_REQUESTS_SERVICE
- If it is a
- Add a global feature flag to change this behavior across all integrations
- Better, not
requestsspecific, so the impact will be larger... assuming someone wants this across all integrations and not justrequests
- Better, not
- Wait until 2.0.0 release and we change the default behavior for
requestsand any other integrations following this pattern
A short term mitigation, if you don't want to manually set the service name for requests...
import ddtrace
ddtrace.config.requests.service = ddtrace.config.service
hey @brettlangdon,
AFAIU, httpx integration works like this.
First of all, we use patch without _default_service and this point dramatically change situation.
We use this wrapper func:
https://github.com/DataDog/dd-trace-py/blob/1.x/ddtrace/contrib/httpx/patch.py#L101
Please pay your attention to _get_service_name().
By default we use ext_service() in _get_service_name
https://github.com/DataDog/dd-trace-py/blob/33b9ac313b1e039f9cdb0a7953ba7a6a0c730162/ddtrace/contrib/httpx/patch.py#L59
And here we use explicit logic https://github.com/DataDog/dd-trace-py/blob/33b9ac313b1e039f9cdb0a7953ba7a6a0c730162/ddtrace/contrib/trace_utils.py#L214-L232
Very important point is here. We get None value as a result.
Then we go though this way:
https://github.com/DataDog/dd-trace-py/blob/33b9ac313b1e039f9cdb0a7953ba7a6a0c730162/ddtrace/contrib/httpx/patch.py#L124
then
https://github.com/DataDog/dd-trace-py/blob/33b9ac313b1e039f9cdb0a7953ba7a6a0c730162/ddtrace/tracer.py#L776-L783
And this final point how we get service name.
https://github.com/DataDog/dd-trace-py/blob/33b9ac313b1e039f9cdb0a7953ba7a6a0c730162/ddtrace/tracer.py#L593-L597
In fact, we use config.service.
P.S. I suppose that removing _default_service from requests can reproduce this behavior, but I need check
# requests default settings
config._add(
"requests",
{
"distributed_tracing": asbool(os.getenv("DD_REQUESTS_DISTRIBUTED_TRACING", default=True)),
"split_by_domain": asbool(os.getenv("DD_REQUESTS_SPLIT_BY_DOMAIN", default=False)),
"_default_service": "requests", # remove this line
},
)
@brettlangdon, I've updated MR, now it's affect the requests lib only.
@brettlangdon, I'm sorry to have kept you waiting, I've prepared demo I mentioned early: https://github.com/dart-neitro/ddtrace-requests-issue If you find I forget about something, please, let me know
My take after reading through the discussion here is that this change is mergeable when the next major version comes around, because it changes a default that lots of customers are relying on. I also think that it's not worth the effort to put this behavior change behind a feature flag because of how small the workaound code is.
You have successfully added a new CodeQL configuration .github/workflows/codeql-analysis.yml:analyze/language:python. As part of the setup process, we have scanned this repository and found 11 existing alerts. Please check the repository Security tab to see all alerts.
@dart-neitro since this PR has not had any activity for awhile I am going to close it for now. Please feel free to update the PR from the base and re-open!