semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

Attribute names: unicode on OTLP, only `[a-z0-9._]` in OTel semcov

Open lmolkova opened this issue 1 year ago • 2 comments
trafficstars

Attribute names can be any unicode sequence

https://github.com/open-telemetry/semantic-conventions/blob/dd277f62f66f3342be33aec2c432f6bd959b379b/docs/general/attribute-naming.md?plain=1#L24

It makes sense for user apps using OTel API and OTLP, but is not accepted by our (build-tools) tooling

ID_RE = re.compile("([a-z](\\.?[a-z0-9_-]+)+)")
"""Identifiers must start with a lowercase ASCII letter and
contain only lowercase, digits 0-9, underscore, dash (not recommended) and dots.
Each dot must be followed by at least one allowed non-dot character."""

We should document and enforce the rules that we have for semantic convention definitions in this repo:

  • only a-z, 0-9, . and _ are accepted
  • starts with a letter
  • ends with a letter or number
  • (no dashes - there are no existing attributes with it)

These rules are necessary for code-generation. They should also apply to metric names, units, event names, event payload fields, or other properties that are likely to be represented as a code.

We can expand the list of allowed characters if we can find a way to support code generation for them.

lmolkova avatar Jun 05 '24 16:06 lmolkova

The use of [a-z0-9._] is currently merely a recommendation, not a strict restriction. I think it is fine if we want to rely on that recommendation and make it the default behavior for our tools but we may need the tools to be able to deal with exceptions.

I think use cases like this show that strictly prohibiting it may create problems with interoperability with other standards.

tigrannajaryan avatar Jul 11 '24 14:07 tigrannajaryan

Agreed. Our existing tooling imposes such limitations - CI checks would flag it and fail. One of the reasons to have this limitation is to be able to translate attribute/metrics/etc names to constant names in the code.

To support other characters, we'd need some mechanism to define a code-friendly name for such identifiers. We might need a similar mechanism for https://github.com/open-telemetry/semantic-conventions/issues/1118#issuecomment-2173803006 (phase 2).

lmolkova avatar Jul 11 '24 22:07 lmolkova