opentelemetry-php icon indicating copy to clipboard operation
opentelemetry-php copied to clipboard

rework ServiceInfoFactory priorities

Open brettmc opened this issue 5 months ago • 8 comments

  • default/all: load detectors from registry last
  • env-provided detectors: load in same order as default/all, then from registry in requested order

Related: #1643

brettmc avatar Jun 19 '25 02:06 brettmc

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 68.20%. Comparing base (00417cf) to head (9fa81e9). :warning: Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #1644      +/-   ##
============================================
- Coverage     68.24%   68.20%   -0.04%     
+ Complexity     2919     2912       -7     
============================================
  Files           435      436       +1     
  Lines          8876     8890      +14     
============================================
+ Hits           6057     6063       +6     
- Misses         2819     2827       +8     
Flag Coverage Δ
8.1 67.86% <100.00%> (+0.03%) :arrow_up:
8.2 68.09% <100.00%> (-0.04%) :arrow_down:
8.3 68.14% <100.00%> (+0.09%) :arrow_up:
8.4 68.15% <100.00%> (+0.08%) :arrow_up:
8.5 68.09% <100.00%> (+0.06%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/SDK/Resource/Detectors/Service.php 87.50% <100.00%> (-12.50%) :arrow_down:
src/SDK/Resource/Detectors/ServiceInstance.php 100.00% <100.00%> (ø)
src/SDK/Resource/ResourceInfoFactory.php 97.95% <100.00%> (-2.05%) :arrow_down:

... and 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 00417cf...9fa81e9. Read the comment docs.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Jun 19 '25 02:06 codecov[bot]

The way I’m reading https://github.com/open-telemetry/opentelemetry-specification/blob/v1.45.0/specification/resource/sdk.md#specifying-resource-information-via-an-environment-variable is that the Environment detector always has to be last, even after other registry-provided custom detectors, as part of the default detector. I could be wrong though.

Edit: it also sounds like it’s mandatory.

smaddock avatar Jun 19 '25 03:06 smaddock

@smaddock great question. Do you mean this part of spec?

The SDK MUST extract information from the OTEL_RESOURCE_ATTRIBUTES environment variable and merge this, as the secondary resource, with any resource information provided by the user, i.e. the user provided resource information has higher priority.

I think that does imply that environment detection should be last(ish). But I also think the spec has painted us into a corner here:

  • env must be last so that user-provided info takes precedence over almost anything else
  • except for OTEL_SERVICE_NAME, which has higher priority than OTEL_RESOURCE_ATTRIBUTES=service.name=xxx, which means the service detector should be higher priority than env detector
  • which means that service detector is just about the highest priority detector
  • which means that service.instance.id as provided by service detector is higher priority than that provided by another detector

So, what to do? I agree with @GrzegorzDrozd issue, service.instance.id needs to be lower priority, or opt-in/opt-out. This will be an issue for #1628

The options I see are:

  • move service.instance.id out of the service detector into service_instance and make it opt-in
  • leave it there, but provide an env var to opt-out
  • remove it entirely, delegating to specialised implementations

I think that shared-nothing runtimes are still in the majority of PHP usage, and so our current offering for service.instance.id is not going to be useful for the majority.

For reference, the current guidance for service.instance.id is https://github.com/open-telemetry/semantic-conventions/blob/v1.34.0/docs/registry/attributes/service.md

brettmc avatar Jun 19 '25 04:06 brettmc

Ah, I see the circular definition now. I might map that out visually for my own sake, but for the purposes of this PR, this is the part I keep coming back to:

SDKs… are free to use an inherent unique ID as the source of this value if stability is desirable.

What if we had a config/env setting that toggles between:

  • the default ephemeral UUIDv4 if unset, and
  • a more stable UUIDv5 based on some X factor, which is passed as the value of the config/env setting. Could be the Apache process ID, host UUID… whatever the user wants.

smaddock avatar Jun 19 '25 05:06 smaddock

Ah, I see the circular definition now. I might map that out visually for my own sake, but for the purposes of this PR, this is the part I keep coming back to:

SDKs… are free to use an inherent unique ID as the source of this value if stability is desirable.

What if we had a config/env setting that toggles between:

* the default ephemeral UUIDv4 if unset, and

* a more stable UUIDv5 based on some X factor, which is passed as the value of the config/env setting. Could be the Apache process ID, host UUID… whatever the user wants.

But to be most useful it would have to be provided by the developer - in my case I am using machine-id + sha of deployed git commit - that way I can have multiple persistent instances/machines + deployed version and it changes every time new code is deployed. Plus this has to be provided by dev as soon as possible (at least detector name) because once factories are created in autoload there is no way to change that. In my case I am overriding OTEL_PHP_DETECTORS on top of the file/in env settings for the machine:

putenv('OTEL_PHP_DETECTORS=env,host,process,static-service-id');

and then after autoload is required i am just registering:

Registry::registerResourceDetector('static-service-id', new StaticServiceId(GIT_COMMIT_ID));

GrzegorzDrozd avatar Jun 22 '25 11:06 GrzegorzDrozd

@brettmc It looks like it will solve my issue - I will test this branch early next week.

GrzegorzDrozd avatar Jun 22 '25 11:06 GrzegorzDrozd

Any resource attribute should have mechanisms for users/developers to create overrides. Environment variables or manually setting the attributes as mentioned in https://github.com/open-telemetry/opentelemetry-php/issues/1643#issuecomment-2985050911 should always override them.

My thoughts above were around what methods of setting the service.instance.id would still comply with the specifications odd requirements. Since the spec gives SDKs some leeway in this, I was imagining what a flexible option there might look like. It doesn’t sound to me like the spec allows for additional resource detectors to override the service resource detector, but I’m just some guy asking questions.

smaddock avatar Jun 22 '25 11:06 smaddock

I've updated the PR, splitting off a new ServiceInstance detector, and noting the spec deviation:

  • not used in default/all case
  • runs before detectors from registry

Since the portion of the spec that's tripping us up here has development stability, I'll create an issue to see if we can relax that wording. (edit: https://github.com/open-telemetry/opentelemetry-specification/pull/4570 )

brettmc avatar Jun 24 '25 05:06 brettmc