semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

Implement code-generation hints to drop/rename attributes in case of a collision

Open lmolkova opened this issue 1 year ago • 2 comments

This is a tracking issue for https://github.com/open-telemetry/semantic-conventions/issues/1118#issuecomment-2173803006 Phase 2

Currently we don't allow collisions between attribute/event/etc names in the code-friendly representation of them (. replaced with _).

But we have some attributes that would benefit from renaming (e.g. db.cosmosdb.client_id should stay consistent with messaging.client.id).

We can resolve it by providing code-generation hints in the yaml that would help code-generator to resolve collisions.

lmolkova avatar Oct 09 '24 18:10 lmolkova

Scenarios to consider

  1. attribute is deprecated, the replacement has the same constant name (messaging.client_id -> messaging.client.id):
    • code-generator hint could tell to drop it (the decision we made in https://github.com/open-telemetry/semantic-conventions/issues/1031)
  2. attribute is deprecated, the replacement has the same constant name, but different type
    1. code generator hint could tell to drop it (same as scenario 1)
    2. OR code-generator may need to generate both attributes, so needs a hint to use a different name for the replacement attribute
  3. Attribute name contains non-ASCII characters and needs a code-generator-friendly name ...

Proposal

Combining with https://github.com/open-telemetry/semantic-conventions/issues/1419 to keep it consistent with deprecation if possible.

Scenarios 1 & 2i

- id: registry.foo
  ...
  attributes:
    - id: foo.bar_id
      deprecated:            # not in the scope of this item
        action: rename
        renamed_to: foo.bar.id
      code_generation_hint:  # <-- this is the proposal for scenarios 1 and 2i above
        action: exclude      # <-- this is the proposal for scenarios 1 and 2i above

Scenarios 2ii & 3

- id: registry.foo
  ...
  attributes:
    - id: foo.bar_code
      type: int
      deprecated:
        action: remove
      note: Renamed to `foo.bar.code` and type changed to string
    - id: foo.bar.code
      type: string
      code_generation_hint:                # <-- this is the proposal for scenario 2ii
        code_friendly_name: foo_bar_code2  # <-- this is the proposal for scenario 2ii
    - id: pprof::foo.bar
      code_generation_hint:                # <-- this is the proposal for scenario 3 
        code_friendly_name: pprof_foo_bar  # <-- this is the proposal for scenario 3 

Alternative proposal

- id: registry.foo
  ...
  attributes:
    - id: foo.bar_id
      deprecated:                         # not in the scope of this item
        action: rename
        renamed_to: foo.bar.id
      code_generation_hint: exclude       # <-- this is the proposal for scenarios 1 and 2i above
    - id: pprof::foo.bar
      code_friendly_name:  pprof_foo_bar  # <-- this is the proposal for scenario 2ii & 3 

lmolkova avatar Oct 09 '24 18:10 lmolkova

The following policies will be necessary:

  • code_generator: action: remove or alternatives MUST NOT appear on stable things.
  • if code_friendly_name is not set, it's considered to be {id}.replace('.', '_') and MUST NOT change for experimental or stable things in the future version (i.e. code_friendly_name can be specified once at the attribute creation time (or within the same version attribute was defined)
  • Same with code_generation_hint: action: exclude or alternatives - it SHOULD NOT be changed for existing attributes. Except current exceptions - https://github.com/open-telemetry/semantic-conventions/blob/6c1c9761263693307c5e7a10275a8b929b72695d/policies/attribute_name_collisions.rego#L82

lmolkova avatar Oct 09 '24 19:10 lmolkova

Based on the discussions on slack there is a preference for option 1:

      code_generation_hint: 
        action: exclude
        code_friendly_name: <>

lmolkova avatar Oct 14 '24 15:10 lmolkova

#1624 is currently blocked because we are trying to rename code.function to code.function.name and code.column to code.column.number.

I understand from the description of https://github.com/open-telemetry/semantic-conventions/pull/1614 PR that this PR could provide a solution to this, but it's more about "conflicts with dots to underscores" rather than "adding suffix to an already existing semconv attribute".

For https://github.com/open-telemetry/semantic-conventions/pull/1613 the rename of db.system.* to db.system.name suggested in https://github.com/open-telemetry/semantic-conventions/issues/1581 was avoided by using a different attribute prefix: db.provider.name which does not conflict. In the discussion of https://github.com/open-telemetry/semantic-conventions/issues/1581 there is no mention adding a .name suffix could be something not allowed, so am I right to assume that this should be something we should be allowed to do ?

SylvainJuge avatar Dec 02 '24 10:12 SylvainJuge

If this change can help with namespace suffix changes, then it could also probably help unblock https://github.com/open-telemetry/semantic-conventions/pull/1638 where creating a new attribute that reuses the namespace of an existing attribute is not possible (geo.location).

SylvainJuge avatar Dec 02 '24 10:12 SylvainJuge

How would this look like in weaver?

From 2/12/2025 discussion:

We could have a filter in weaver to pull code friendly names, like:

{{ attribute | code_friendly_name | the_case_i_want }}

Additionally we should think about having default JQ filters respect action: drop by default.

jsuereth avatar Feb 12 '25 15:02 jsuereth

updated proposal based on the discussion:

  1. Let's tackle exclusion first, we don't have a real case that'd need code_friendly_name yet.
  2. Let's have hints and code_generation as one of the possible hints. We might want to put PII and backward-migration hints there.

So the proposal

hints:
  code_generation:
     action: exclude

Future possibilities

hints:
  code_generation:
     code_friendly_id: <id>
  sensitivity_level: PII
  ...

lmolkova avatar Feb 13 '25 02:02 lmolkova

From discussion -

Should this be an annotation vs. a hint? Can we have annotations/hints/tags be raw JSON? Can we have validation they match an expected schema via new "group" that defines the annotation type?

Example:

groups:
  type: annotation_type
  ... defined code_generation.exclude annotation ...
groups:
   type: attribute_group
   attributes:
   - id: ...
     annotations:
       code_generation:
         exclude: true

In rust, jinja, jq I could interact with these as raw JSON-ish value or extract via a type if desired, e.g. this could be defined in weaver_forge:

#[derive(Serialize,Deserilize)]
struct CodeGeneration {
  exclude
}

group.get_annotation<JsonValue>("code_generation")

jsuereth avatar Mar 19 '25 14:03 jsuereth

Fixed with upcoming release of weaver.

jsuereth avatar Apr 09 '25 14:04 jsuereth