semantic-conventions
semantic-conventions copied to clipboard
Implement code-generation hints to drop/rename attributes in case of a collision
This is a tracking issue for https://github.com/open-telemetry/semantic-conventions/issues/1118#issuecomment-2173803006 Phase 2
Currently we don't allow collisions between attribute/event/etc names in the code-friendly representation of them (. replaced with _).
But we have some attributes that would benefit from renaming (e.g. db.cosmosdb.client_id should stay consistent with messaging.client.id).
We can resolve it by providing code-generation hints in the yaml that would help code-generator to resolve collisions.
Scenarios to consider
- attribute is deprecated, the replacement has the same constant name (
messaging.client_id->messaging.client.id):- code-generator hint could tell to drop it (the decision we made in https://github.com/open-telemetry/semantic-conventions/issues/1031)
- attribute is deprecated, the replacement has the same constant name, but different type
- code generator hint could tell to drop it (same as scenario 1)
- OR code-generator may need to generate both attributes, so needs a hint to use a different name for the replacement attribute
- Attribute name contains non-ASCII characters and needs a code-generator-friendly name ...
Proposal
Combining with https://github.com/open-telemetry/semantic-conventions/issues/1419 to keep it consistent with deprecation if possible.
Scenarios 1 & 2i
- id: registry.foo
...
attributes:
- id: foo.bar_id
deprecated: # not in the scope of this item
action: rename
renamed_to: foo.bar.id
code_generation_hint: # <-- this is the proposal for scenarios 1 and 2i above
action: exclude # <-- this is the proposal for scenarios 1 and 2i above
Scenarios 2ii & 3
- id: registry.foo
...
attributes:
- id: foo.bar_code
type: int
deprecated:
action: remove
note: Renamed to `foo.bar.code` and type changed to string
- id: foo.bar.code
type: string
code_generation_hint: # <-- this is the proposal for scenario 2ii
code_friendly_name: foo_bar_code2 # <-- this is the proposal for scenario 2ii
- id: pprof::foo.bar
code_generation_hint: # <-- this is the proposal for scenario 3
code_friendly_name: pprof_foo_bar # <-- this is the proposal for scenario 3
Alternative proposal
- id: registry.foo
...
attributes:
- id: foo.bar_id
deprecated: # not in the scope of this item
action: rename
renamed_to: foo.bar.id
code_generation_hint: exclude # <-- this is the proposal for scenarios 1 and 2i above
- id: pprof::foo.bar
code_friendly_name: pprof_foo_bar # <-- this is the proposal for scenario 2ii & 3
The following policies will be necessary:
code_generator: action: removeor alternatives MUST NOT appear on stable things.- if
code_friendly_nameis not set, it's considered to be{id}.replace('.', '_')and MUST NOT change for experimental or stable things in the future version (i.e.code_friendly_namecan be specified once at the attribute creation time (or within the same version attribute was defined) - Same with
code_generation_hint: action: excludeor alternatives - it SHOULD NOT be changed for existing attributes. Except current exceptions - https://github.com/open-telemetry/semantic-conventions/blob/6c1c9761263693307c5e7a10275a8b929b72695d/policies/attribute_name_collisions.rego#L82
Based on the discussions on slack there is a preference for option 1:
code_generation_hint:
action: exclude
code_friendly_name: <>
#1624 is currently blocked because we are trying to rename code.function to code.function.name and code.column to code.column.number.
I understand from the description of https://github.com/open-telemetry/semantic-conventions/pull/1614 PR that this PR could provide a solution to this, but it's more about "conflicts with dots to underscores" rather than "adding suffix to an already existing semconv attribute".
For https://github.com/open-telemetry/semantic-conventions/pull/1613 the rename of db.system.* to db.system.name suggested in https://github.com/open-telemetry/semantic-conventions/issues/1581 was avoided by using a different attribute prefix: db.provider.name which does not conflict.
In the discussion of https://github.com/open-telemetry/semantic-conventions/issues/1581 there is no mention adding a .name suffix could be something not allowed, so am I right to assume that this should be something we should be allowed to do ?
If this change can help with namespace suffix changes, then it could also probably help unblock https://github.com/open-telemetry/semantic-conventions/pull/1638 where creating a new attribute that reuses the namespace of an existing attribute is not possible (geo.location).
How would this look like in weaver?
From 2/12/2025 discussion:
We could have a filter in weaver to pull code friendly names, like:
{{ attribute | code_friendly_name | the_case_i_want }}
Additionally we should think about having default JQ filters respect action: drop by default.
updated proposal based on the discussion:
- Let's tackle exclusion first, we don't have a real case that'd need
code_friendly_nameyet. - Let's have
hintsandcode_generationas one of the possible hints. We might want to put PII and backward-migration hints there.
So the proposal
hints:
code_generation:
action: exclude
Future possibilities
hints:
code_generation:
code_friendly_id: <id>
sensitivity_level: PII
...
From discussion -
Should this be an annotation vs. a hint?
Can we have annotations/hints/tags be raw JSON?
Can we have validation they match an expected schema via new "group" that defines the annotation type?
Example:
groups:
type: annotation_type
... defined code_generation.exclude annotation ...
groups:
type: attribute_group
attributes:
- id: ...
annotations:
code_generation:
exclude: true
In rust, jinja, jq I could interact with these as raw JSON-ish value or extract via a type if desired, e.g. this could be defined in weaver_forge:
#[derive(Serialize,Deserilize)]
struct CodeGeneration {
exclude
}
group.get_annotation<JsonValue>("code_generation")
Fixed with upcoming release of weaver.