opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

[entities-wg] Rubric for evaluation of Entity signal designs

Open jsuereth opened this issue 1 year ago • 3 comments

Today during the otel-entities WG, we discussed values we'd use in rubrics to evaluate future OTEPs/Designs on entities. These are a set of principles we'd like to uphold, but can be flexible on. Designs to move forward with entities should list these conditions in pros/cons (at a minimum).

I'm opening this issue to record decsions and a follow on comment to add un-addressed items we need to decide upon.

Core Principles

  • Resource detectors (soon to be entity detectors) need to be composable / disjoint

  • New entities added by extension should not break existing code

    • This means if a user takes an action to leverage a new entity, things may change.
    • If a user upgrades to e.g. a new SDK OOTB defaults cannot break their existing o11y flow
  • Navigational attributes need to exist and can be used to identify an entity but could be augmented with UUID or other aspects. - Having ONLY a UUID for entity identification is not good enough.

    • o11y needs to be actionable - E.g. you should be able to execute a kubectl get pods <name> for a k8s pod.
    • We'll need to work through design issues here - LOTS of discussion and options and nuanced trade-offs.
    • Navigational identity should not change unless the entity identity itself changes.
  • Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets.

    • e.g. SDK + Collector both having k8s detection - this should be supported.
    • This may lead to additional issues we'll need to address.
  • Users are expected to provide / prioritize "detectors" and determine which entity is "producing" or most-important for a signal

    • Priorities - This is important if there is overlap in information. We should see if we can avoid this situation.
    • e.g. Java - discovering service.name. Have a variety of them running in a default order. Realistic to think users want to shift these around.
  • For an SDK - ALL telemetry should be associated with the same set of entities (resource labels).

    • The association of signals relies on using the same entities to navigate those signals
    • We need to make sure identity is the same even through multi-observers.

    These are some principles we agreed are important and will evaluate in our rubric on design choices.

jsuereth avatar Jun 06 '24 21:06 jsuereth

Issue 1 - Multi observers

We need the ability to understand if two observers are discussing the same entity. Should the entity have the same ID or should this situation be detectable?

jsuereth avatar Jun 06 '24 21:06 jsuereth

Issue 2 - ENV variable for resources

We need some interaction between enttity, resource + ENV variable that doesn't break OTEL operator users (and others leveraging ENV variables).

jsuereth avatar Jun 06 '24 21:06 jsuereth

Issue 3 - Duplicate entity reporting

Should we prevent duplicate entities from being emitted across all possible telemetry sources? Should we have an automatic way for the collector, e.g. to unify duplicate sources of entities and only emit one definitive signal?

jsuereth avatar Jun 06 '24 21:06 jsuereth

Copying notes from latest SiG meeting on additional principles:

Issue 1 - Multi observers

Two observers are discussing/reporting the same entity - is this something we permit or consider a bug?

  • Users will need to be involved in solving multi-observer merge
  • We need the solution to allow this and it's a very important problem to get right
  • We should try to solve the ~80% such that users won't need to worry about it but for advanced cases.

Issue 2 - ENV variable for resources

We need some interaction between entity, resource + ENV variable that doesn't break OTEL operator users (and others leveraging ENV variables). Ideally the platform can push identity/entity into SDKs via ENV variable.

  • This is a problem we should solve and include in our solution.

Issue 3 - Duplicate entity reporting

Should we prevent duplicate entities from being emitted across all possible telemetry sources? Should we have an automatic way for the collector, e.g. to unify duplicate sources of entities and only emit one definitive signal?

  • This is problem we can't solve this entirely in OpenTelemetry
  • We should provide tools to solve this in OpenTelemetry Collector
  • We should provide a data model with guidance on how to solve this problem.

jsuereth avatar Jul 18 '24 16:07 jsuereth

Captured in https://github.com/open-telemetry/oteps/pull/264.

jack-berg avatar Sep 04 '24 15:09 jack-berg