[Grafana Cloud OTLP Endpoint] Support specifying OTel Resource Attributes promoted as Loki labels
Is your feature request related to a problem? Please describe.
Context: As discussed with @sandeepsukhani and many others, we want to simplify Loki's OpenTelemetry ingestion path and move away from the otel2loki converters available through the OpenTelemetry Collector Loki Exporter and the Alloy otelcol.exporter.loki in favor of the newly introduced Loki OTLP Endpoint.
However, we have identified the limitation to specify OTel resource attributes that should be promoted as Loki labels:
- If self managed Loki supports overwriting the default list of resource attributes that are promoted as labels through
distributor: otlp_config / default_resource_attributes_as_index_labels(docs here), Grafana Cloud Logs and the Grafana Cloud OTLP Endpoint does provide such a stack wide config option. - The Loki OTLP Endpoint doesn't offer mechanisms for the logs ingestion pipeline to specify additional resource attributes to promote as labels similar to the
loki.resource.labelsattributes that was available when using the OpenTelemetry Collector Loki Exporter
Describe the solution you'd like
I would like
- A global configuration parameter in Grafana Cloud equivalent to
distributor: otlp_config / default_resource_attributes_as_index_labels(docs here) to overwrite the default list of resource attributes that are promoted as labels. - A mechanism for the OTLP ingestion pipeline to specify additional resource attributes to promote as labels. This mechanism could be slightly different from OpenTelemetry Collector Loki Exporter
loki.resource.labelsmechanism as the desired solution would not be about overwriting the global list of resource attributes promoted as labels (seedefault_resource_attributes_as_index_labels) but to extend it.
Describe alternatives you've considered
Continue to do the otel2loki conversion through the OpenTelemetry Collector Loki Exporter and Alloy otelcol.exporter.loki but it's more burden put on the Loki users and none of these converters leverage Loki V3 metadata.
Additional context
Similar to the problem Grafana Labs Community - Add additional index labels in Loki 3.0 via OTLP
Given the removal of the lokiexporter in September, this feature gap hits us pretty hard as Grafana Cloud users. Any update on the possibility of promoting resource attributes to indexed labels on Loki using the OTLP exporter & endpoint?
@fredrikgh can you please help us understand what kind of attribute you want to promote as Loki labels? Are they additional standard resource attributes? custom resource attributes? What information do these attributes convey?
@stevendungan can you please help here?
Following your question @cyrille-leclerc, here's my use-case for what it's worth
I wanted to use this functionality as a way to circumvent the fact that le level field is no longer present. It's replaced with detected_level in loki 3.1 but that is not supported by grafana and is not indexed. There's a bug for this of course.
I also have a custom field that I use a lot in the dropdown in the explore view, similar to service_name. If my field cannot get indexed I won't have it in the drop-downs in grafana cloud.
Grafana Cloud documentation says:
Because it is too costly from a cardinality perspective, Grafana Loki indexes a few attributes from log entries instead of indexing all available attributes or the entire log message. As such, you must provide hints to the Loki translator, stating which attributes to promote to Loki labels. You can do this by adding new synthetic attributes, which are read by the Loki translator and removed before the data is sent over the network. The following snippet shows how the processors section looks when you add a resource processor that adds the loki.resource.labels hint. This example tells the Loki translator that the host_name resource attribute should be promoted to a label. You are not required to add labels, and every entry that passes through the Loki exporter will have a static label exporter with the value OTLP by default. For more information about labels and how to chose the right ones for your use case, refer to the Loki documentation.
But this behavior doesn't actually work when sending over OTLP to the Grafana Cloud OTLP endpoint in our experience for any resource attribute we want to promote to a label.
@fredrikgh can you please help us understand what kind of attribute you want to promote as Loki labels? Are they additional standard resource attributes? custom resource attributes? What information do these attributes convey?
One example we had was to have loki labels for exception and/or scope of a log entry, i.e. custom attributes.
Grafana Cloud documentation says:
... But this behavior doesn't actually work when sending over OTLP to the Grafana Cloud OTLP endpoint in our experience for any resource attribute we want to promote to a label.
@adrielp This documentation is outdated, it predates the introduction of Loki structured metadata, we are going to refresh this section.
Please use OTel log attributes to capture logs metadata (eg thread.name...). Note that the OTel auto instrumentation of logging frameworks is usually capable of capturing interesting metadata.
We are sorry for the inconvenience. Would this solution meet your expectations?
One example we had was to have loki labels for
exceptionand/orscopeof a log entry, i.e. custom attributes.
Thanks @fredrikgh , would you by any chance have example values and a sense of the cardinality?
In particular, I would be interested in understanding:
exceptionis it:- Just a marker like
true/falseto have a different data management policy, for example different retention policy? - The exception type like
NullPointerException - Or also include the exception message like
InvalidFormatException: '123azerty' is not a valid integer
- Just a marker like
scopeis it:- A reference to the OpenTelemetry instrumentation scope name which is mapped to the logger name by the OTel auto instrumentation of logging framework, for example
com.mycompany.OrderService
- A reference to the OpenTelemetry instrumentation scope name which is mapped to the logger name by the OTel auto instrumentation of logging framework, for example
Thanks @cyrille-leclerc - glad the updates are going to be made. I'd also keep an eye on the entity OTEP that relates to resource attributes. I think these types of things will be important for labels as things evolve.
Thanks @fredrikgh , would you by any chance have example values and a sense of the cardinality?
In particular, I would be interested in understanding:
exceptionis it:
- Just a marker like
true/falseto have a different data management policy, for example different retention policy?- The exception type like
NullPointerException- Or also include the exception message like
InvalidFormatException: '123azerty' is not a valid integer
scopeis it:
- A reference to the OpenTelemetry instrumentation scope name which is mapped to the logger name by the OTel auto instrumentation of logging framework, for example
com.mycompany.OrderService
@cyrille-leclerc It would be NullPointerException and com.mycompany.OrderService respectively. I suppose technically, these aren't to be considered resource attributes. But some mechanism of getting these indexed would be very useful.
@adrielp: Thanks @cyrille-leclerc - glad the updates are going to be made. I'd also keep an eye on the https://github.com/open-telemetry/oteps/pull/264 that relates to resource attributes. I think these types of things will be important for labels as things evolve.
We are aligned here, we have several engineers who contribute to this OTEP, both to surface better the concept of entities in OTel and to hlp improve the support for high dimensionality in Prometheus
@fredrikgh: @cyrille-leclerc It would be
NullPointerExceptionandcom.mycompany.OrderServicerespectively. I suppose technically, these aren't to be considered resource attributes. But some mechanism of getting these indexed would be very useful.
Thanks @fredrikgh. Please pardon my curiosity but what is your use case for this level of details in labels and thus this cardinality on the log streams?
Applications in java have hundreds of logger name (eg com.mycompany.OrderService) and use dozens of exception classes (NullPointerException).
I suspect we may not be aware with the use case you are solving here.
@cyrille-leclerc we were misusing them initially. We have a limitation on error metrics exported by the apps, and built log data dashboards for log meta analysis instead. E.g. error count by certain metadata, backed by recording rules. But we've accomplished this now with label_format and all is well.
Getting standard resource attributes such as cluster, node, pod etc as indexed labels is a more valid use case, and more fitting to resource attributes. I may have missed it, but have you settled on how you intend to make this possible? This is indeed where we used loki_resource_labels before.
@cyrille-leclerc on the standard resource attributes indexed by default in Grafana Cloud (via OTLP endpoint), is this list (taken from Loki docs) still accurate?
- service.name
- service.namespace
- service.instance.id
- deployment.environment
- cloud.region
- cloud.availability_zone
- k8s.cluster.name
- k8s.namespace.name
- k8s.pod.name
- k8s.container.name
- k8s.replicaset.name
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.cronjob.name
- k8s.job.name
In terms of resource attributes, this list seems exhaustive. Honestly, our case to customize would only be semantic. E.g. remove k8s...name to reduce bloat when dealing with the raw labels in Explore. But maybe these conventions are required by Grafana app obs, as they are for metrics/traces.
Seems level is indexed now too, is that the only non-resource attribute?
Hello @fredrikgh ,
is this list (taken from Loki docs) still accurate?
This list is accurate.
But maybe these conventions are required by Grafana app obs, as they are for metrics/traces.
This list is primarily driven by the ongoing unification of resource attribute promotion as labels in Loki and Prometheus/Mimir and the fact that we want to enable slicing and dicing metrics by common dimensions. Promotion of resource attributes as Prometheus metrics is still being developed, not GA yet.
E.g. remove k8s...name to reduce bloat when dealing with the raw labels in Explore
You can edit the Loki config to change this list. Can you please detail the pain point having those additional labels in your logs exploration UX? How different is it having OTel resource attributes and logs attributes as Loki metadata versus having them as labels?
We are discussing of promoting a few more resource attributes as labels in Loki and Prometheus/Mimir
- Adding
deployment.environment.nameto the list as it's now replacingdeployment.environment. We consider a transition period during which both name may be used so we don't plan to removedeployment.environmentfrom the list - Adding
service.versionas it's a very common to slice data by service version to identify if application upgrades cause regressions.
How do you see these additions, would they make sense to you? Would they increase the UX challenges you have in mind?
Hi @cyrille-leclerc,
Thanks for the swift response.
E.g. remove k8s...name to reduce bloat when dealing with the raw labels in Explore
You can edit the Loki config to change this list. Can you please detail the pain point having those additional labels in your logs exploration UX? How different is it having OTel resource attributes and logs attributes as Loki metadata versus having them as labels?
Not sure of the semantics here, Loki metadata vs labels. I'd imagine labels populate the dropdowns, while metadata is attached to entries like attributes.
What we want is quite simple, a curated list of understandable labels in the handy dropdown of the Explore UI. Simply to not overwhelm app developers of either irrelevant or verbose label names. But this is no major pain point, functionally we're OK.
I wasn't aware Loki configs in Grafana cloud may help here, will look into it.
We are discussing of promoting a few more resource attributes as labels in Loki and Prometheus/Mimir
- Adding
deployment.environment.nameto the list as it's now replacingdeployment.environment. We consider a transition period during which both name may be used so we don't plan to removedeployment.environmentfrom the list- Adding
service.versionas it's a very common to slice data by service version to identify if application upgrades cause regressions.How do you see these additions, would they make sense to you? Would they increase the UX challenges you have in mind?
I didn't know of deployment.environment changing, nor did I notice that service.version was missing (despite us using it). These changes are of no concern to us.
Possibly related to https://github.com/grafana/loki/issues/14788.