apm icon indicating copy to clipboard operation
apm copied to clipboard

Labels vs Intake for service-specific fields

Open basepi opened this issue 2 years ago • 3 comments

Many services have information that is valuable to users when included in span/transaction documents. For example, we want to attach the S3 bucket and object keys to S3 spans. In PHP, users would like to have the Wordpress Theme attached to transactions.

For this discussion we're not talking about fields that will be used in the UI. Those should always be standardized in the intake spec. Instead, these are extra pieces of data that will be present in the documents should users need them, and can be used for custom dashboarding.

Labels

Pros:

  • Adding a new label does not require a new version of the intake/APM server -- fields added this way are immediately available to users
  • Reduces bloat in the intake spec

Cons:

  • Labels have historically been reserved for users (though this is a gray area with the OTLP intake, which maps extra attributes to labels)
  • Labels do not enforce consistency across agents

Intake

Pros:

  • Enforces consistency across agents
  • Reserves labels for users (again, a gray area due to our OTLP mapping)

Cons:

  • "Expensive" to add -- have to add to intake and then wait for next Server version
  • Adding service-specific fields will decrease the intake's adherence to ECS
  • Spec bloat due to potentially hundreds of service-specific fields in the long term (Otel has almost 30, just for DynamoDB)

Other alternatives

  • Add a label-like namespace that is not exposed to the user, which would keep these new fields separate from the user-controlled label namespace, while allowing for label-like flexibility.

Thoughts?

/cc @felixbarny @AlexanderWert @gregkalapos @estolfo

basepi avatar Feb 08 '23 18:02 basepi

We discussed this with @AlexanderWert and @felixbarny

In general:

  • This is a fairly new are in terms of what kind of data we collect here - so far we only ended up collecting generic data (e.g. DB statement, HTTP request URL, and similar) that were not specific to a single technology (e.g. S3 bucket).
  • As @basepi said, this data is only used as information which we present to users in a generic form (e.g. on span flyout, or in discover), but it's very unlikely we'd ever build specific UI for these, or use it for correlating things. Still, this information is very useful for the specific given technology.
  • With above, this type of data is different from the general data we capture on intake.

One aspect we should also think about is OTel and OTel semantic conventions. Ideally, once we collect technology specific fields, we should try to use keys defined by OTel semantic conventions. For S3, it's not unrealistic to expect that it'd be covered by OTel.

Agents with OTel bridge already have otel.attributes - as said above, these end up in Labels in APM Server/Elasticsearch, but at least on the agent side it's distinct from labels.

So far, one suggestion we came up with is:

  • Send technology specific data in otel.attributes if it's defined in OTel semantic conventions
    • This enables us to collect useful information without adding to Intake and requiring updated server version
    • It still uses very well defined keys - additionally we can still create a document in this repo as a central book-keeping to list otel.attributes collected by specific agents to make sure we are aligned - this would be a much more lightweight process than adding to intake.
  • Send technology specific data in labels if it's unlikely that it'll be part of OTel semantic conventions.

One open point is how these will end up in elasticsearch - especially de-dotting.

Current situation:

  • Both otel.attributes and labels will end up as labels and they'll be de-dotted. The de-dotting we do can be confusing to users - it's not ideal. If we want to make changes to this, then this would be an ideal time to do so (most OTel bridges are still not GA, over time OTel adaption is expected to grow, etc).

Options:

  • We keep mapping both labels and otel.attributes on labels, but set sub-objects = false and don't do de-dotting for data coming from otel.attributes. Other labels will remain de-dotted to avoid breaking existing users who use labels. This opens up the question on what we do with existing pure OTel agents (mapping to labels should be adapted, or not?)
  • We do a breaking change and stop de-dotting for both otel.attributes and labels. We discuss de-dotting for metrics, and it seems, long term we want to stop de-dotting there, and it'd be nice to be aligned with that. We could also add a config to mitigate the breaking change.
  • We add an additional field for the otel.attributes and don't store those on labels anymore - this would make it more complex, it'd require changes on how we handle plain OTel agents (change the mapping of attributes there), but may enable us to avoid breaking changes.
  • Something else?

So in sum:

  • On the agent side, I think we should be aligned with OTel semantic conventions, it covers most technology specific fields and pack those into otel.attributes on the agent side. At the same time, try to contribute to OTel semantic conversions if specific fields are not yet defined there.
  • It's TBD on how we handle this in APM Server.

gregkalapos avatar Feb 09 '23 11:02 gregkalapos

On the agent side, I think we should be aligned with OTel semantic conventions, it covers most technology specific fields and pack those into otel.attributes on the agent side. At the same time, try to contribute to OTel semantic conversions if specific fields are not yet defined there.

@gregkalapos Do we expect to block until the upstream contribution to the semantic conventions is accepted? Probably fine, just haven't been involved with the semantic conventions so I don't know how lengthy that process is.

basepi avatar Feb 09 '23 18:02 basepi

On the agent side, I think we should be aligned with OTel semantic conventions, it covers most technology specific fields and pack those into otel.attributes on the agent side. At the same time, try to contribute to OTel semantic conversions if specific fields are not yet defined there.

@gregkalapos Do we expect to block until the upstream contribution to the semantic conventions is accepted? Probably fine, just haven't been involved with the semantic conventions so I don't know how lengthy that process is.

Good question. For things, where there is not much room for discussion, I'd not block it - for S3 bucket, I don't think there is much discussion needed, so I'd not block that - but no strong feeling on that.

Also, a related idea: I don't know how much we want to document or communicate this, but if we already list these fields somewhere anyway (e.g. in this repo) we could add a column to track if something is already part of OTel semantic convention or not. That way we can communicate to users that things which are already part of the OTel spec will be as stable as the OTel spec, but others which are not yet in the OTel spec may change later (if e.g. our proposed key names are changed during OTel spec PR review).

gregkalapos avatar Feb 09 '23 19:02 gregkalapos