build-tools Allow updating enum values when referencing attribute

Semantic conventions that reference a common attribute with an enum type, should be able to extend/overwrite the list of possible values.

E.g.

- id: http.request.method
  type:
    allow_custom_values: true
    members:
      - id: delete
        value: "DELETE"
        brief: 'DELETE method.'
      - id: get
        value: "GET"
        brief: 'GET method.'
...

- ref: http.request.method
  type:
    allow_custom_values: true
    members:
      - id: purge
        value: "PURGE"
        brief: 'non-standard PURGE method.'

currently redeclaring a type is not supported.

While additional values could be documented in the note, md generation adds another note with possible values according to enum.

Related: https://github.com/open-telemetry/semantic-conventions/pull/205

Jul 28 '23 20:07 lmolkova

There's a need for this in https://github.com/open-telemetry/semantic-conventions/pull/258. There's several attributes that are used in different metrics like hw.status. From @bertysentry :

The attributes do have the same meaning across all device types. However, we may want to add an additional values that are specific a type of device. Example: we may want to specify the value charging for the state attribute in the hw.status metric, that is applicable to batteries only (charging).

If these are defined as enums, when -ref them in metrics we will need to add more values to it.

Aug 23 '23 10:08 joaopgrassi

This seems like a bit of a hack to me and maybe a mis-use of "ref" to compensate for a lacking feature. Would it be better to first decouple the enum definition from the attribute definition and define a top-level "types" section alongside "groups"?

Aug 23 '23 15:08 Oberon00

Would it be better to first decouple the enum definition from the attribute definition and define a top-level "types" section alongside "groups"?

would it mean that the same attribute would have different type for different conventions? Or that we'd need to introduce type inheritance?

Aug 23 '23 16:08 lmolkova

Both, I think 😃 (It's the same with the ref: suggestion though, at least I would say that the type becomes a different one if you add values, and that the ref acts as inheritance mechanism)

Aug 24 '23 08:08 Oberon00

BTW I believe enums being enums in code generation is a back-compat pain - when value is removed it breaks every lib that used it.

I think they should be generated as maps instead of enums and then this issue can be resolved without type inheritance or type hell in strongly typed languages.

for example, we have expandable enums (backed by a map) implementation in Azure SDKs, Here's an example in Java

public final class CloudEventDataFormat extends ExpandableStringEnum<CloudEventDataFormat> {
    public static final CloudEventDataFormat BYTES = fromString("BYTES", CloudEventDataFormat.class);
    public static final CloudEventDataFormat JSON = fromString("JSON", CloudEventDataFormat.class);

    public static CloudEventDataFormat fromString(String name) {
        return super.fromString(name, CloudEventDataFormat.class);
    }
}

essentially it feels like an enum, but can create new instances of itself for unregistered strings.

Aug 24 '23 18:08 lmolkova

(or maybe we can introduce new map type so we don't change behavior for existing enums at once)

Aug 24 '23 18:08 lmolkova

when value is removed it breaks every lib that used it

That will be the case no matter how you implement code generation. You can at most make it a runtime error instead of a compile time error which would be bad anyways. Removing values from enums is not covered by our stability guarantees I think, but there is no sensible way to implement instrumentations so defensively that they would survive removal of an enum value they actually use without code changes.

Aug 25 '23 07:08 Oberon00

Watch me implement it in a sensible and back-compatible way: SemConv.HttpMethods.fromString("PURGE")

Aug 25 '23 15:08 lmolkova

Another example is *.system

E.g. db.system has ~50 values. When we define system-specific semconv, we reference db.system and (if we get rid of not-full tables), we end up listing all 50 in each extension: https://github.com/open-telemetry/semantic-conventions/pull/973

I'd like to be able to do something like

- id: db
   ...
   attributes:
    - id: db.system
      type: 
        members: 
        - id: cassandra
        - id: mongodb
        ...

- id: mongodb
   attributes:
    - ref: db.system
      type: 
        # some flag indicating that I don't want to extend everything?
        members: 
        - ref: mongodb

in this case I want only once value to be supported and not inherit all 50 values.

May 01 '24 17:05 lmolkova

Another use-case is the cpu.state that we want to introduce. ref: https://github.com/open-telemetry/semantic-conventions/pull/1026#discussion_r1596882134.

Ideally we want to define the cpu.state and then be able to reference it from the system.*, process.* and container.* metrics. However, we would like to have the option to only reference a subset of the enum members in each metricset and also be able to override the member's description/brief so as to be more accurate per metric type.

May 30 '24 15:05 ChrsMark