Duplicate Log Attribute Keys
Question
Use Github Discussions.
Currently when using Logging attributes in Scope and LogMessage I observed, that the dotnet exporter exports every attribute not just the latest occurrence.
using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "abc - scope0") })) // scope 0
using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "def - scope1") })) // scope 1
{
logger.LogInformation("Hello Log Message: '{fooAttr}' '{barAttr}'", "ghi - log msg", "barAttrValue");
}
In this example fooAttr is added 3 times to the attribute list
Output of ConsoleLogExporter:
LogRecord.Timestamp: 2023-03-22T12:44:02.5096056Z LogRecord.CategoryName: Program LogRecord.LogLevel: Information LogRecord.FormattedMessage: Hello Log Message: 'ghi - log msg' 'barAttrValue' LogRecord.StateValues (Key:Value): fooAttr: ghi - log msg barAttr: barAttrValue OriginalFormat (a.k.a Body): Hello Log Message: '{fooAttr}' '{barAttr}' LogRecord.ScopeValues (Key:Value): [Scope.0]:fooAttr: abc - scope0 [Scope.1]:fooAttr: def - scope1
OpenTelemetry Collector receives list with duplicate attribute keys:
Timestamp: 2023-03-22 12:44:02.5096056 +0000 UTC SeverityText: Information SeverityNumber: Info(9) Body: Str(Hello Log Message: 'ghi - log msg' 'barAttrValue') Attributes: -> dotnet.ilogger.category: Str(Program) -> fooAttr: Str(ghi - log msg) -> barAttr: Str(barAttrValue) -> {OriginalFormat}: Str(Hello Log Message: '{fooAttr}' '{barAttr}') -> fooAttr: Str(abc - scope0) -> fooAttr: Str(def - scope1) Trace ID: Span ID: Flags: 0
This could be an unintentional side effect from the removal of the scope depth prefix (#3843). It seems that during serialization, first all attributes from the log message then all attributes from the scopes are serialized (beginning with the oldest scope). But no deduplication is done during this process.
The specification describes Log-Attributes to be a 'map<string, any>' where attribute keys are unique (data-model.md). Currently the actual output seems to be only a serialized list.
Is this behavior intended and the specification incomplete or is this behavior a Bug?
Additional Context
Depending on a receiving backend tool the attribute list may be interpreted differently. I.e. in Loki only the last occurrence of an attribute is parsed -> fooAttr is def - scope1 instead of ghi - log msg

Reproduce
Using net6.0 and the following OpenTelemetry NuGet-Packages
<PackageReference Include="Microsoft.Extensions.Logging" Version="7.0.0" />
<PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.4.0" />
<PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.4.0" />
<PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol.Logs" Version="1.4.0-rc.4" />
// See https://aka.ms/new-console-template for more information
using Microsoft.Extensions.Logging;
using OpenTelemetry.Exporter;
using OpenTelemetry.Logs;
using var loggerFactory = LoggerFactory.Create(builder =>
{
builder.AddOpenTelemetry(options =>
{
options.AddConsoleExporter();
options.AddOtlpExporter(o =>
{
o.Protocol = OtlpExportProtocol.Grpc;
o.HttpClientFactory = () =>
{
var client = new HttpClient();
return client;
};
o.Endpoint = new Uri("http://localhost:4317/v1/logs");
});
options.IncludeScopes = true;
options.ParseStateValues = true;
options.IncludeFormattedMessage = true;
});
});
var logger = loggerFactory.CreateLogger<Program>();
using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "abc - scope0") })) // scope 0
using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "def - scope1") })) // scope 1
{
logger.LogInformation("Hello Log Message: '{fooAttr}' '{barAttr}'", "ghi - log msg", "barAttrValue");
}
Deduplication is going to be expensive. The OTLP proto does not validate (its basically List and not a Map), so the onus is on the upstream to avoid duplicates.
- As a user prevent duplicates.
- OTel sdk OR OTLP Exporter can likely provide an opt-in mechanism to dedup for users who cannot ensure 1 and can accept the perf penality.
It is not an oversight, this was considered in the PR itself https://github.com/open-telemetry/opentelemetry-dotnet/pull/3843/files#diff-e28eaa16d5ac47130740a2783c474f316e9a1699aa3ada3a537951246e7cf517R172
Also, scopes is not the only way one can have duplicates. Logger.LogInfo("hello from {fruit} {fruit} {fruit}", "apple", "banana", "mango"); can also cause duplicates.
@c4brei btw, excellent write up with clear description and repro steps!
Tagged it for the OTLP Log Exporter stable release milestone. We could end up with not doing anything (and just ask users to not produce duplicate attributes), or provide opt-in mechanism to dedup.
Thank you for the quick answer :)
The mentioned option to deduplicate Attributes would be very helpful. The resulting Mapping errors from duplicates in backend tools (i.e. the mentioned Loki case) can be hard to notice when the attribute value itself is difficult to read (e.g. guid).
In a larger project it will be difficult to enforce or remind everyone to use attributes with care. Especially since attribute creation can be hidden and retroactively altered just by adding a scope, with the same attribute name, earlier in the call chain, thus it is easy to introduce a mistake. You would have to go up/down the call chain to check for duplicate attribute names.
Some additional ideas that might be helpful when this topic comes around for the stable release
Adjust attribute serialization order
- first serialize scopes (oldest to youngest)
- then attributes from LogMessage
- so the last attribute in the list should always be the most recent one that was defined
Option to disable export of Attributes defined in a LogMessage
- so that
logger.LogInformation("Some fruit '{fruit}'", "banana");behaves likelogger.LogInformation($"Some fruit '{fruitObj.name}'"); - this could prevent accidental creation of duplicate attributes
- and prevent some performance decrease of string interpolation
so that logger.LogInformation("Some fruit '{fruit}'", "banana"); behaves like logger.LogInformation($"Some fruit '{fruitObj.name}'");
This defeats the purpose of structured logging by putting everything into a string, so this should be discouraged.!
This issue was marked stale due to lack of activity and will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. This bot runs once per day.
Added spec compliance tag. https://github.com/open-telemetry/opentelemetry-specification/pull/3987
Added spec compliance tag. open-telemetry/opentelemetry-specification#3987
@cijothomas based on the specification link you provided, am I correctly understanding that SDKs will perform deduplication of attribute keys unless we explicitly specify not to?
If yes, then does dotnet SDK support this at the moment or is in the process of supporting it soon?
Yes, SDKs are expected to follow the spec. OTel .NET is not following this part of the spec today OTel .NET should do de-duplication and comply with the spec. It is TBD if this should be done by default or make it opt-in. (The original reason for not doing this is performance, which is still valid, so most likely this'd be opt-in, instead of opt-out to maintain backward compatibility in terms of perf.) - But this is up-to the maintainers to make a call (I am not longer maintainer here.)
As of now, none is actively working on this feature, so no ETA that can be shared. Contributions are welcome.
I ran into this today because when enabling IncludesScopes in an ASP.NET Core app, two scopes add RequestId, which is duplicated on the exported data. This causes issues with the Elasticsearch exporter, which doesn't (currently) account for this.
@alanwest I might try to pick this up shortly and at least play around with potential solutions. Would you agree that this should be opt-in? From a dev-ex point of view, I'd prefer that spec compliance is enabled by default, even if it has a small overhead, with an option to disable it if needed. Given that this occurs in any ASP.NET Core app with IncludeScopes and that it's not obvious when an export fails, defaulting to the spec-compliant behaviour feels cleaner.
I could start by reviewing the code properly to see what the overhead might be for an implementation, and then we could assess from there.
I could start by reviewing the code properly to see what the overhead might be for an implementation
@stevejgordon Sounds good! Thus far in this discussion it has been assumed that the overhead will be high, so this sounds like a good place to start. From there we can make a more informed decision for whether the behavior is enabled or disabled by default.
Even though we don't know the performance impact yet, just my opinion regarding the default behavior:
With missing attribute deduplication, it is very easy to introduce errors unnoticed. Just at some point higher in the call chain e.g. when traces for some API-interface are added -> all the downstream internal service/module logs will be affected!
Wrong logs will lead to wrong assumptions and wasted time (very costly)!
In my opinion, by default the SDK should comply with the specification. Less performant logs are better than possible wrong logs. Nobody expects the current behavior, since it deviates from the specification. Telemetry tools don't know how to handle such logs and may all behave inconsistently.
It should be a conscious choice for a developer to skip deduplication with the drawbacks for performance reasons.
If we find out that it is really not possible to follow to the specification by default, I think we should try to extend the specification to handle duplicate attributes cases gracefully. So that telemetry tools can all behave correctly.
Until this gets fixed in the SDK, I've shared a workaround to handle attribute deduplication using the opentelemetry collector transform processor before exporting to data-prepper which doesn't support ingesting duplicate attributes, in case someone hits the same wall as I did :)
https://github.com/opensearch-project/data-prepper/issues/3868#issuecomment-2988475617
If we find out that it is really not possible to follow to the specification by default, I think we should try to extend the specification to handle duplicate attributes cases gracefully. So that telemetry tools can all behave correctly.
It is definitely possible to follow the specs by default.
Thanks, @cijothomas. My initial thought, without having looked too deeply at the code, was that data from scopes needs to be added to the attributes on the LogRecord before processors run, and at that point, the correct deduplication logic can occur to resolve the final value for each unique attribute name. This would also allow consumers to redact or remove attributes from scope data in a processor, which is currently impossible. At a high level, would that seem reasonable?