semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

Add geo fields into attribute registry

Open gregkalapos opened this issue 1 year ago • 14 comments
trafficstars

Fixes #1033

Changes

Adds geo fields into the attribute registry into its own file.

This is another take on https://github.com/open-telemetry/semantic-conventions/pull/351, which basically copied the fields into both client and server. Here I aim for adding the fields only into the registry and not using it yet. At the same time, we have https://github.com/open-telemetry/build-tools/issues/240 with a few suggestions from @trisch-me.

So this PR will add the fields into the registry - then once https://github.com/open-telemetry/build-tools/issues/240 is implemented, we could use these geo fields by updating client.yaml and server.yaml and embed these fields into those.

Merge requirement checklist

gregkalapos avatar Jun 03 '24 16:06 gregkalapos

Are there any comments from other maintainers regarding this new namespace?

trisch-me avatar Jun 11 '24 14:06 trisch-me

Would it be possible to add a use case for those?

Do we expect them to be on span/metrics attributes? If the goal is to record events, they may go into the payload and should not be defined in the registry then.

lmolkova avatar Jun 28 '24 04:06 lmolkova

Would it be possible to add a use case for those?

In the OpenTelemetry collector we are adding a processor that would add those attributes to any trace, metric or log. The processor looks for an IP within the resource attributes and appends the related geolocation attributes. The component is still a nop (we are just adding support for multiple geolocation providers): https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/geoipprocessor

Additional context: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32663

rogercoll avatar Jun 28 '24 06:06 rogercoll

thanks for the clarification @rogercoll

Looking into the collector: source.address is not a resource attribute, it's dynamic and describes the network packet source.

https://github.com/open-telemetry/semantic-conventions/blob/9adff435c76ea3b8cba89babefc832d3dd3c1ab9/docs/general/attributes.md?plain=1#L143-L150

Is you intention to enrich individual telemetry signals or resources?

lmolkova avatar Jun 28 '24 15:06 lmolkova

Is you intention to enrich individual telemetry signals or resources?

@lmolkova The intention of the GeoIP processor is to have a flexible processor that can derive geo information from an IP.

What the source of the IP is (i.e. which attribue it's coming from) and what the destination (i.e. namespace) depends on the use case.

For example, on web server access logs the geoIP processor can be used to derive the geo information for each individual request's source.ip, allowing endusers to visualize where (geographically) requests come from.

But there could also be other use cases, that would require to read the IP from other attributes (either signal level or resource), etc.

AlexanderWert avatar Jun 30 '24 09:06 AlexanderWert

@AlexanderWert thanks for the context!

So if I understand correctly the intention is to

  • have ability to add geo attributes to any signal (traces, metrics, logs, resources)
  • based on the pre-existing arbitrary IP attribute on that signal

I.e. we do need to define them as attributes and not event payload fields.

Correct ?

Looking only from the semconv perspective, it could be somewhat misleading. I can have more than one IP on the telemetry item (e.g. host|device.ip, client.address, network.peer.address, ...) - if I add geo.*, it's not clear which one they apply to. What if some instrumentation (or PII redactor) reuses geo attributes to report end-user geography?

I assume embedding could be useful here and we could embed the geo namespace under related IPs?

E.g. if I configure geo-processor for client.address, it adds client.geo.*, or if I do it for source.address, processor adds source.geo.*

Effectively, geo namespace becomes something that's usually embedded and not used as stand-alone. This way we can enrich signals with geo information unambiguously and on any level (application, instr library, processor, backend).

Thoughts?

lmolkova avatar Jul 01 '24 17:07 lmolkova

@lmolkova you are correct, geo namespace is usually embedded into parent namespace and enriches it with geo information

In ECS we don't use geo namespace as standalone one in the root of events

trisch-me avatar Jul 01 '24 20:07 trisch-me

Effectively, geo namespace becomes something that's usually embedded and not used as stand-alone. This way we can enrich signals with geo information unambiguously and on any level (application, instr library, processor, backend).

Yes, that's absolutely right!

I think we should consider it as a two steps process to add geo attributes to semconv:

  1. This PR defines the core / plain geo attributes in the registry.
  2. Once we have the embedding functionality, we will use that to extend those to be used under source.*, client.*, etc.
groups:
  - id: client
    prefix: client
    type: attribute_group
    brief: ...
    embed_namespaces:
      - id: geo

BTW, @trisch-me what's the status with embed? Do we have any blockers with that?

AlexanderWert avatar Jul 02 '24 06:07 AlexanderWert

what's the status with embed?

An option to embed not the full namespace but just particular fields (for example create client.geo.lat) is merged and we are preparing weaver release for it. Embedding the whole namespace at once is the next step

trisch-me avatar Jul 02 '24 08:07 trisch-me

Thanks again for the context @AlexanderWert and @trisch-me !

So maybe we can move forward by adding some disclaimer into the group brief like

Note: Geo attributes are typically used under another namespace, such as client.* and describe the location of the corresponding entity (device, end-user, etc). Semantic conventions that reference geo attributes (as a root namespace) or embed them (under their own namespace) SHOULD document what geo attributes describe in the scope of that convention.

?

Also, it would be great if we could start embedding some common attributes under client, source or other namespaces.

lmolkova avatar Jul 02 '24 18:07 lmolkova

related: https://github.com/open-telemetry/semantic-conventions/issues/1228

lmolkova avatar Jul 09 '24 22:07 lmolkova

Failing CI seems to be unrelated:

4m 56s
Run make markdown-link-check
semantic-conventions@ /home/runner/work/semantic-conventions/semantic-conventions
└── [email protected]


  ERROR: 1 dead links found in ./docs/system/system-metrics.md !
  [✖] https://blogs.oracle.com/linux/post/understanding-linux-kernel-memory-statistics → Status: 0
make: *** [Makefile:66: markdown-link-check] Error 1
Error: Process completed with exit code 2.

gregkalapos avatar Jul 30 '24 17:07 gregkalapos

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Aug 15 '24 03:08 github-actions[bot]

I reckon this PR is still on review (unstale)

rogercoll avatar Aug 19 '24 06:08 rogercoll

LGTM with some suggestions. The key part is geo.city.name (which I'm suggesting to remove or rename to geo.locality.name

Agreed. ECS calls it city_name, so it's a bit unfortunate to see that this will diverge from ECS; on the other hand, I also wasn't satisfied with city - calling this locality is clearly an improvement. I'd be ok to proceed with that, so I kept the field and changed it to locality.

gregkalapos avatar Nov 18 '24 20:11 gregkalapos

:(

  ERROR: 1 dead links found in ./docs/system/system-metrics.md !
  [✖] https://blogs.oracle.com/linux/post/understanding-linux-kernel-memory-statistics → Status: 0

gregkalapos avatar Nov 18 '24 20:11 gregkalapos

I think this is ready to be merged - I'm not authorized, so I can't do so.

Could someone please hit merge?

gregkalapos avatar Nov 18 '24 20:11 gregkalapos