semantic-conventions
semantic-conventions copied to clipboard
Decide how to organize "sub namespaces" on registry YAML model files
Context
In some cases, the yaml model file in the attributes registry contains multiple "levels" of attributes. One example is the Database one: https://github.com/open-telemetry/semantic-conventions/blob/main/model/registry/db.yaml.
The top id is registry.db, and all attributes go into that. Since for databases, there's multiple db systems, each them have that appended to the id, like cassandra.* or mongodb.*.
When generating the markdown for these attributes in the registry, we rely on tags to render the individual db system attribute tables, like <!-- semconv registry.db(omit_requirement_level,tag=db-generic) -->.
Problems with this approach:
- The yaml file is large, and finding a "group" of attributes (say for cassandra) is hard, as they are all together under the same group
id: registry.db - We have to rely on
tagsto be able to render the markdown table for each individual "thing". Because of this, we have to repeat the same tag for each attribute in yaml, like so: https://github.com/open-telemetry/semantic-conventions/blob/main/model/registry/db.yaml#L14
An alternative to this
Instead of relying in tags, in the model for the registry we can simply organize each individual group under it's own id. For example:
id: registry.db.cassandraid: registry.db.mongodb
Pros of this option
- Don't need to use tags
- It's easier to find the attributes per group in the yaml files
- It's easier/clearer to generate markdown tables for each group
An example of this approach can be found in this PR: https://github.com/open-telemetry/semantic-conventions/pull/848/files#diff-3efbd7bfaa9b1122d4421e83e19833ead514f4c41ef2c72450bb8abc725f35e1
What to do
We need to decide how we want to go forward and make it consistent across the repo.
I like the proposal!
We need to have meaningful guidelines on the following though:
- When to split into sub-groups (which also implies sub-sections in the corresponding registry readme) vs. keeping it in one table / group. I think we should do the split only in cases when the overall table gets too large otherwise, as one of the original purposes of the registry view is to have a flat, ordered list of attributes (that is easily navigable). I agree though that in some cases (like the DB, AWS) it makes sense to split it into sub-namespaces.
- When we do split into sub-groups, I think the splitting should be solely based on the sub-namespace! We should avoid semantic grouping of attributes in the registry (i.e. splitting a set of attributes into a separate group though the attributes have different sub-namespaces), because it would make navigation and discoverability difficult again.
Maybe we should add it to the guidelines? So new contributions will follow the process and semantic meaning of splitting the groups?
For the second option there will be no defined registry.db group. So we will not be able to generate list of all db attributes without grouping if need arises. Using tags this will be possible, but I'm not sure if this case is relevant
@trisch-me registry.db is already defined today, and it contains the general attributes :).
Maybe we should add it to the guidelines? So new contributions will follow the process and semantic meaning of splitting the groups?
Yeah once we agree I will add to the guidelines.
Yes it is defined and has all sub attributes under it, where grouping is happening through tags. So generic attributes are having tag db-generic
If we will change it to the different ids, we will not have all attributes under main category.
I'm not against second option. I just want to bring it to our attention that in that case generation of all sub attributes for given main category (db, aws, process etc) will not be possible (or I'm not aware how to do so)
I'd prefer to focus on the markdown and the final representation of the attributes. So far the yaml organization was not important.
Authors can split into subgroups, or use one group with tags when it helps them produce better markdown.
If we see that some groups became too big and we'd like to change it - let's do it, but I don't understand the benefit of having any rigid guidelines on yaml organization unless we need it for something very specific (like auto-generating registry).
I think we can provide soft-guidance (e.g. in contrib.md?) to use yaml-group per table to be rendered in the MD.
E.g.:
- if
http.requestandhttp.responseare rendered in the same registry table, they should be in the same yaml. If we ever feel like splitting them, we should be able to do it. messaging.kafkaandmessaging.rabbitmqshould probably appear in different tables, so they should be defined in different yaml groups
Usually it'd mean that system-specific attributes should be defined in the individual groups. Since registry will be auto-generated, tags will be useless and it all will prevent registry groups from growing up too much.
See #952 for the implementation on db/messaging.
So I think then the initial idea of using groups of general + specific attributes is the way to go. I will try to add some guidance on the docs for this. Assigning to me.
I think this is obsolete as we have made several changes in the model/yaml/markdown since moving to Weaver. I also don't remember what I wanted to do, so I'm closing it 😅.
Feel free to re-open or create a new one if you think this is somehow still relevant.