semantic-conventions
semantic-conventions copied to clipboard
Add geo fields into attribute registry
Fixes #1033
Changes
Adds geo fields into the attribute registry into its own file.
This is another take on https://github.com/open-telemetry/semantic-conventions/pull/351, which basically copied the fields into both client and server. Here I aim for adding the fields only into the registry and not using it yet. At the same time, we have https://github.com/open-telemetry/build-tools/issues/240 with a few suggestions from @trisch-me.
So this PR will add the fields into the registry - then once https://github.com/open-telemetry/build-tools/issues/240 is implemented, we could use these geo fields by updating client.yaml and server.yaml and embed these fields into those.
Merge requirement checklist
- [x] CONTRIBUTING.md guidelines followed.
- [x] Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with
[chore]
- If your PR does not need a change log, start the PR title with
- [ ] schema-next.yaml updated with changes to existing conventions.
Are there any comments from other maintainers regarding this new namespace?
Would it be possible to add a use case for those?
Do we expect them to be on span/metrics attributes? If the goal is to record events, they may go into the payload and should not be defined in the registry then.
Would it be possible to add a use case for those?
In the OpenTelemetry collector we are adding a processor that would add those attributes to any trace, metric or log. The processor looks for an IP within the resource attributes and appends the related geolocation attributes. The component is still a nop (we are just adding support for multiple geolocation providers): https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/geoipprocessor
Additional context: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32663
thanks for the clarification @rogercoll
Looking into the collector:
source.address is not a resource attribute, it's dynamic and describes the network packet source.
https://github.com/open-telemetry/semantic-conventions/blob/9adff435c76ea3b8cba89babefc832d3dd3c1ab9/docs/general/attributes.md?plain=1#L143-L150
Is you intention to enrich individual telemetry signals or resources?
Is you intention to enrich individual telemetry signals or resources?
@lmolkova The intention of the GeoIP processor is to have a flexible processor that can derive geo information from an IP.
What the source of the IP is (i.e. which attribue it's coming from) and what the destination (i.e. namespace) depends on the use case.
For example, on web server access logs the geoIP processor can be used to derive the geo information for each individual request's source.ip, allowing endusers to visualize where (geographically) requests come from.
But there could also be other use cases, that would require to read the IP from other attributes (either signal level or resource), etc.
@AlexanderWert thanks for the context!
So if I understand correctly the intention is to
- have ability to add geo attributes to any signal (traces, metrics, logs, resources)
- based on the pre-existing arbitrary IP attribute on that signal
I.e. we do need to define them as attributes and not event payload fields.
Correct ?
Looking only from the semconv perspective, it could be somewhat misleading. I can have more than one IP on the telemetry item (e.g. host|device.ip, client.address, network.peer.address, ...) - if I add geo.*, it's not clear which one they apply to.
What if some instrumentation (or PII redactor) reuses geo attributes to report end-user geography?
I assume embedding could be useful here and we could embed the geo namespace under related IPs?
E.g. if I configure geo-processor for client.address, it adds client.geo.*, or if I do it for source.address, processor adds source.geo.*
Effectively, geo namespace becomes something that's usually embedded and not used as stand-alone.
This way we can enrich signals with geo information unambiguously and on any level (application, instr library, processor, backend).
Thoughts?
@lmolkova you are correct, geo namespace is usually embedded into parent namespace and enriches it with geo information
In ECS we don't use geo namespace as standalone one in the root of events
Effectively,
geonamespace becomes something that's usually embedded and not used as stand-alone. This way we can enrich signals with geo information unambiguously and on any level (application, instr library, processor, backend).
Yes, that's absolutely right!
I think we should consider it as a two steps process to add geo attributes to semconv:
- This PR defines the core / plain geo attributes in the registry.
- Once we have the
embeddingfunctionality, we will use that to extend those to be used undersource.*,client.*, etc.
groups:
- id: client
prefix: client
type: attribute_group
brief: ...
embed_namespaces:
- id: geo
BTW, @trisch-me what's the status with embed? Do we have any blockers with that?
what's the status with embed?
An option to embed not the full namespace but just particular fields (for example create client.geo.lat) is merged and we are preparing weaver release for it.
Embedding the whole namespace at once is the next step
Thanks again for the context @AlexanderWert and @trisch-me !
So maybe we can move forward by adding some disclaimer into the group brief like
Note: Geo attributes are typically used under another namespace, such as
client.*and describe the location of the corresponding entity (device, end-user, etc). Semantic conventions that referencegeoattributes (as a root namespace) or embed them (under their own namespace) SHOULD document whatgeoattributes describe in the scope of that convention.
?
Also, it would be great if we could start embedding some common attributes under client, source or other namespaces.
related: https://github.com/open-telemetry/semantic-conventions/issues/1228
Failing CI seems to be unrelated:
4m 56s
Run make markdown-link-check
semantic-conventions@ /home/runner/work/semantic-conventions/semantic-conventions
└── [email protected]
ERROR: 1 dead links found in ./docs/system/system-metrics.md !
[✖] https://blogs.oracle.com/linux/post/understanding-linux-kernel-memory-statistics → Status: 0
make: *** [Makefile:66: markdown-link-check] Error 1
Error: Process completed with exit code 2.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
I reckon this PR is still on review (unstale)
LGTM with some suggestions. The key part is
geo.city.name(which I'm suggesting to remove or rename togeo.locality.name
Agreed. ECS calls it city_name, so it's a bit unfortunate to see that this will diverge from ECS; on the other hand, I also wasn't satisfied with city - calling this locality is clearly an improvement. I'd be ok to proceed with that, so I kept the field and changed it to locality.
:(
ERROR: 1 dead links found in ./docs/system/system-metrics.md !
[✖] https://blogs.oracle.com/linux/post/understanding-linux-kernel-memory-statistics → Status: 0
I think this is ready to be merged - I'm not authorized, so I can't do so.
Could someone please hit merge?