build-tools
build-tools copied to clipboard
Add `embed` concept for attributes reuse (in addition to `ref`)
This issue describes a concrete proposal on how to realise attributes reuse through yaml syntax and tooling.
Related issues / PRs:
- https://github.com/open-telemetry/semantic-conventions/pull/351
- https://github.com/open-telemetry/semantic-conventions/pull/559
Proposal
I'm proposing to introduce a new, additional type of attribute reference called embed (alternatively: nest) in the syntax that would behave similarly to ref property BUT with the following difference: When embed is used the referenced attribute will be nested under the namespace of the attribute group in which the attribute is being referenced.
Example
See the following example that illustrates the idea and the difference to ref:
We would define geo attributes in a geo.yaml file:
groups:
- id: geo
prefix: geo
type: attribute_group
brief: Describes the geo location
attributes:
- id: location.lon
type: double
brief: Longitude of the geo location in [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System#WGS84).
examples: [ -73.614830 ]
- id: location.lat
type: double
brief: Latitude of the geo location in [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System#WGS84).
examples: [ 45.505918 ]
In client and server files we would embed the attributes.
! Note: Here I consciously embed the geo.location.lon attribute BUT ref the geo.location.lat attribute to demonstrate the difference:
groups:
- id: client
prefix: client
type: attribute_group
brief: ...
attributes:
- id: address
...
- embed: geo.location.lon <<-- THAT IS NEW
- ref: geo.location.lat
When rendering (or code generating) the above it will result in the following attributes:
client.geo.location.longeo.location.lat
Note the difference that geo.location.lon is now nested under the client namespace because it was embedded (embed) while geo.location.lat keeps it's original full name because it was referenced (ref).
Extended proposal
We could even think about an embed_namespaces property on the attribute_group level to simplify embeddings of entire namespaces into other namespace. For example for geo we could do something like the following to embed ALL of the geo.* fields into the client:
groups:
- id: client
prefix: client
type: attribute_group
brief: ...
embed_namespaces:
- id: geo <<-- EMBED the entire geo namespace under client
attributes:
- id: address
...
@open-telemetry/specs-semconv-approvers @open-telemetry/specs-semconv-maintainers would love your thoughts on the above!
Most probably embed_namespaces is more handy because it would save us time from having to embed all of the attributes one by one. However I would like to have support for both so as to have the flexibility to embed only specific attributes if needed.
I cannot think of any particular pitfall here (maybe others can spot sth) but in general I like this proposal +1.
Bumping up this issue and add more information.
I propose to add following keywords:
1. includes
This one will make the whole namespace re-usable and adds it to the current group on the same level - without adding a prefix to the group. It should allow to add multiple namespaces
Example: in the current schema we do have browser.yaml inside models
groups:
- id: browser
prefix: browser
type: resource
brief: >
The web browser in which the application represented by the resource is running.
The `browser.*` attributes MUST be used only for resources that represent applications
running in a web browser (regardless of whether running on a mobile or desktop device).
attributes:
- ref: browser.brands
- ref: browser.platform
- ref: browser.mobile
- ref: browser.language
- ref: user_agent.original
using new keyword it will be written as following:
groups:
- id: browser
prefix: browser
type: resource
includes: [browser] <--- NEW keyword
brief: >
The web browser in which the application represented by the resource is running.
The `browser.*` attributes MUST be used only for resources that represent applications
running in a web browser (regardless of whether running on a mobile or desktop device).
attributes:
- ref: user_agent.original
where in md file we could either unpack namespace into specific fields or just give a link to the registry for the whole namespace:
| [`browser.*`](../attributes-registry/browser.md) | Namespace | Browser namespace | | Opt-In | <---- includes
2. embeds
This one will make the whole namespace re-usable and adds it to the current group UNDER the group using its prefix. It should allow to add multiple namespaces
Example: we would like to add whole geo.* namespace to the client and server to make it possible to attach additional attributes to the both client and server.
groups:
- id: client
prefix: client
type: attribute_group
brief: >
These attributes may be used to describe the client in a connection-based network interaction
.......
attributes:
- id: address
stability: stable
type: string
using new keyword it will be written as following:
groups:
- id: client
prefix: client
type: attribute_group
embeds: [geo] <---- **NEW**
brief: >
These attributes may be used to describe the client in a connection-based network...
attributes:
- id: address
- .....
which resolves into following construction in the md file:
| [`client.geo.*`](../attributes-registry/geo.md) | Namespace | Geo namespace | | Opt-In | <---- embeds
3. embed
This one will make possible to re-use only specific attributes of the namespace, similar to the ref but with a prefix
Example: we don't want to add the whole namespace to the client but only a few attributes.
groups:
- id: client
prefix: client
type: attribute_group
brief: >
These attributes may be used to describe the client in a connection-based network...
attributes:
- ref: client.geo.location.lon
- ...
- embed: geo.location.lat
which resolves into following construction in the md file:
| [`geo.location.lon`](../attributes-registry/geo.md) | type | Description | | Opt-In | <---- ref
| [`client.geo.location.lat`](../attributes-registry/geo.md) | type | Description | | Opt-In | <---- embed
To summarise we would have following representation of all different types (I have changed real browser example to artificial one for similarity with others):
| [`geo.*`](../attributes-registry/geo.md) | Namespace | Geo namespace | | Opt-In | <---- includes whole namespace
| [`client.geo.*`](../attributes-registry/geo.md) | Namespace | Geo namespace | | Opt-In | <---- embeds whole namespace under client
| [`geo.location.lon`](../attributes-registry/geo.md) | type | Description | | Opt-In | <---- referenced attribute
| [`client.geo.location.lat`](../attributes-registry/geo.md) | type | Description | | Opt-In | <---- embedded attribute under client
Please let me know what do you think about this idea, I would like to start to work on it soon.
A new proposal for embedded namespaces.
I have started to implement this approach and found out that not all use cases will be covered by proposed design.
The problem is following - in current approach we embed one namespace directly under another one i.e. if we embed geo under client we will have structure client.geo.*, where * means all attributes of geo namespace. This might work for this particular use case but this will not work when a namespace should embed multiple namespaces under different names.
Se following example:
We do have a process namespace. Process has multiple fields, including process.parent which should be of type process namespace. Process could also have fields like real_user, effective_user, saved_user all of the type of user namespace.
In most cases we do have the same name resolution as namespace, i.e. client.geo is of type geo namespace or file.hash could be of type hash namespace and that could lead to false assumption that just embeded namespace would be sufficient. But what we need to have to resolve this is to create a new attribute with provided name and new type as a namespace
I propose following concept:
- we introduce an
embedkeyword (proposal 3 from previous comment) for particular fields from other namespaces that should be embedded under existing namespace
groups:
- id: client
prefix: client
type: attribute_group
brief: >
These attributes may be used to describe the client in a connection-based network...
attributes:
- ref: geo.location.lon <- referenced attribute
- ...
- embed: geo.location.lat <- embedded attribute
is resolved to
| [`geo.location.lon`](../attributes-registry/geo.md) | double | Description | | Opt-In | <---- ref
| [`client.geo.location.lat`](../attributes-registry/geo.md) | double | Description | | Opt-In | <---- embed
2a. We introduce a new type for id declaration, i.e. in addition to string, enum, etc we will add a namespace
groups:
- id: process
prefix: process
type: attribute_group
brief: ...
attributes:
- id: parent
type: process <- this should be existing namespace in registry
stability: experimental
- id: real_user
type: user <- this should be existing namespace in registry
stability: experimental
2b. We introduce a new keyword for attributes
groups:
- id: process
prefix: process
type: attribute_group
brief: ...
attributes:
- embed_ns: parent
type: process <- this should be existing namespace in registry
stability: experimental
- embed_ns: real_user
type: user <- this should be existing namespace in registry
stability: experimental
| [`process.parent.*`](../attributes-registry/parent.md) | Namespace | Parent namespace | | Opt-In | <---- includes whole namespace
| [`process.real_user.*`](../attributes-registry/user.md) | Namespace | User namespace | | Opt-In | <---- includes whole namespace
@lquerel I think from the implementation POW the second option 2b looks better, WDYT?
Option 2b is cleaner than 2a in my opinion. 2a is very misleading as indicates that it's a single attribute but actually is resolved to a sub-namespace of multiple attributes.
How about the original proposal with a small extension to make the above-mentioned scenario work?. Something like:
groups:
- id: client
prefix: client
type: attribute_group
brief: ...
embed_namespaces:
- id: user
as: real_user
- id: user
as: saved_user
That would avoid the confusion that we would define an entire structure of attributes in the attributes property (as it is the case with 2a and 2b).
that makes sense @AlexanderWert, I like the clear splitting between usual attributes and namespaces
Another thing we have discussed today is that due to similarity in ref and embed we might go another way for a embed option, i.e. there will be no new option but a new flag instead
groups:
- id: client
prefix: client
type: attribute_group
brief: >
These attributes may be used to describe the client in a connection-based network...
attributes:
- ref: geo.location.lon <- referenced attribute
- ...
- ref: geo.location.lat <- embedded attribute
prefix: true
Those will be resolved into
| [`geo.location.lon`](../attributes-registry/geo.md) | double | Description | | Opt-In | <---- ref
| [`client.geo.location.lat`](../attributes-registry/geo.md) | double | Description | | Opt-In | <---- embed
Do you have any comments for this approach?