semantic-conventions
semantic-conventions copied to clipboard
[cloud provider] `host.id` semantics are too broad
host.id is currently used as a catch-all convention for any sort of ID in cloud providers or machines alike, this makes it difficult to use by vendors to retrieve specific cloud provider IDs.
Currently, a single host will have a single value for host.id; in certain environments you can rely on other cloud. attributes like cloud.platform to understand the specific value within host.id. For example, if cloud.platform is aws_ec2, then implictly this ensures that host.id, if present, will have the AWS EC2 instance id.
Proposals like #576 make it so a single host may have multiple possible values for host.id; this makes it impossible for a vendor to identify the actual meaning of host.id.
Within the OpenTelemetry Github org, these are the current values for host.id other than machine-id:
- Azure VM ID (.NET contrib, Python contrib. Ruby contrib, Collector contrib)
- Azure App Service's
WEBSITE_HOSTNAME(Python contrib) - AWS EC2 instance id (PHP contrib, Collector contrib)
- GKE Host id, GCE Host id (Go contrib, Ruby contrib, Collector contrib)
- Consul node id (Collector)
A solution for this is introducing semantic conventions that are specific to a given cloud provider. For example, we currently have gcp.gce.instance.name and #600 proposes a similar convention for AWS EC2.
cc @open-telemetry/semconv-system-approvers
The challenge with any ID is that an ID is only truly usable in a specific context. Given the existence of a host in multiple contexts. Should we look at host.id as a graph and the variations of the ids as a node in a graph that has a graph id that is shared across the contexts.
This is something that the Entities SIG should look into before we make progress