opentelemetry-dotnet icon indicating copy to clipboard operation
opentelemetry-dotnet copied to clipboard

Duplicate Log Attribute Keys

Open c4brei opened this issue 2 years ago • 17 comments

Question

Use Github Discussions.

Currently when using Logging attributes in Scope and LogMessage I observed, that the dotnet exporter exports every attribute not just the latest occurrence.

using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "abc - scope0") }))    // scope 0
using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "def - scope1") }))    // scope 1
{
    logger.LogInformation("Hello Log Message: '{fooAttr}' '{barAttr}'", "ghi - log msg", "barAttrValue");
}

In this example fooAttr is added 3 times to the attribute list

Output of ConsoleLogExporter:

LogRecord.Timestamp:               2023-03-22T12:44:02.5096056Z
LogRecord.CategoryName:            Program
LogRecord.LogLevel:                Information
LogRecord.FormattedMessage:        Hello Log Message: 'ghi - log msg' 'barAttrValue'
LogRecord.StateValues (Key:Value):
    fooAttr: ghi - log msg
    barAttr: barAttrValue
    OriginalFormat (a.k.a Body): Hello Log Message: '{fooAttr}' '{barAttr}'
LogRecord.ScopeValues (Key:Value):
[Scope.0]:fooAttr: abc - scope0
[Scope.1]:fooAttr: def - scope1

OpenTelemetry Collector receives list with duplicate attribute keys:

Timestamp: 2023-03-22 12:44:02.5096056 +0000 UTC
SeverityText: Information
SeverityNumber: Info(9)
Body: Str(Hello Log Message: 'ghi - log msg' 'barAttrValue')
Attributes:
     -> dotnet.ilogger.category: Str(Program)
     -> fooAttr: Str(ghi - log msg)
     -> barAttr: Str(barAttrValue)
     -> {OriginalFormat}: Str(Hello Log Message: '{fooAttr}' '{barAttr}')
     -> fooAttr: Str(abc - scope0)
     -> fooAttr: Str(def - scope1)
Trace ID:
Span ID:
Flags: 0

This could be an unintentional side effect from the removal of the scope depth prefix (#3843). It seems that during serialization, first all attributes from the log message then all attributes from the scopes are serialized (beginning with the oldest scope). But no deduplication is done during this process.

The specification describes Log-Attributes to be a 'map<string, any>' where attribute keys are unique (data-model.md). Currently the actual output seems to be only a serialized list.

Is this behavior intended and the specification incomplete or is this behavior a Bug?

Additional Context

Depending on a receiving backend tool the attribute list may be interpreted differently. I.e. in Loki only the last occurrence of an attribute is parsed -> fooAttr is def - scope1 instead of ghi - log msg image

Reproduce

Using net6.0 and the following OpenTelemetry NuGet-Packages

    <PackageReference Include="Microsoft.Extensions.Logging" Version="7.0.0" />
    <PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.4.0" />
    <PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.4.0" />
    <PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol.Logs" Version="1.4.0-rc.4" />
// See https://aka.ms/new-console-template for more information

using Microsoft.Extensions.Logging;
using OpenTelemetry.Exporter;
using OpenTelemetry.Logs;

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.AddOpenTelemetry(options =>
    {
        options.AddConsoleExporter();
        options.AddOtlpExporter(o =>
        {
            o.Protocol = OtlpExportProtocol.Grpc;
            o.HttpClientFactory = () =>
            {
                var client = new HttpClient();

                return client;
            };
            o.Endpoint = new Uri("http://localhost:4317/v1/logs");
        });
        options.IncludeScopes = true;
        options.ParseStateValues = true;
        options.IncludeFormattedMessage = true;
    });
});

var logger = loggerFactory.CreateLogger<Program>();

using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "abc - scope0") }))    // scope 0
using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "def - scope1") }))    // scope 1
{
    logger.LogInformation("Hello Log Message: '{fooAttr}' '{barAttr}'", "ghi - log msg", "barAttrValue");
}

c4brei avatar Mar 22 '23 13:03 c4brei

Deduplication is going to be expensive. The OTLP proto does not validate (its basically List and not a Map), so the onus is on the upstream to avoid duplicates.

  1. As a user prevent duplicates.
  2. OTel sdk OR OTLP Exporter can likely provide an opt-in mechanism to dedup for users who cannot ensure 1 and can accept the perf penality.

It is not an oversight, this was considered in the PR itself https://github.com/open-telemetry/opentelemetry-dotnet/pull/3843/files#diff-e28eaa16d5ac47130740a2783c474f316e9a1699aa3ada3a537951246e7cf517R172

Also, scopes is not the only way one can have duplicates. Logger.LogInfo("hello from {fruit} {fruit} {fruit}", "apple", "banana", "mango"); can also cause duplicates.

cijothomas avatar Mar 22 '23 16:03 cijothomas

@c4brei btw, excellent write up with clear description and repro steps!

cijothomas avatar Mar 22 '23 17:03 cijothomas

Tagged it for the OTLP Log Exporter stable release milestone. We could end up with not doing anything (and just ask users to not produce duplicate attributes), or provide opt-in mechanism to dedup.

cijothomas avatar Mar 22 '23 17:03 cijothomas

Thank you for the quick answer :)

The mentioned option to deduplicate Attributes would be very helpful. The resulting Mapping errors from duplicates in backend tools (i.e. the mentioned Loki case) can be hard to notice when the attribute value itself is difficult to read (e.g. guid).

In a larger project it will be difficult to enforce or remind everyone to use attributes with care. Especially since attribute creation can be hidden and retroactively altered just by adding a scope, with the same attribute name, earlier in the call chain, thus it is easy to introduce a mistake. You would have to go up/down the call chain to check for duplicate attribute names.

c4brei avatar Mar 23 '23 10:03 c4brei

Some additional ideas that might be helpful when this topic comes around for the stable release

Adjust attribute serialization order

  • first serialize scopes (oldest to youngest)
  • then attributes from LogMessage
  • so the last attribute in the list should always be the most recent one that was defined

Option to disable export of Attributes defined in a LogMessage

  • so that logger.LogInformation("Some fruit '{fruit}'", "banana"); behaves like logger.LogInformation($"Some fruit '{fruitObj.name}'");
  • this could prevent accidental creation of duplicate attributes
  • and prevent some performance decrease of string interpolation

c4brei avatar Mar 23 '23 11:03 c4brei

so that logger.LogInformation("Some fruit '{fruit}'", "banana"); behaves like logger.LogInformation($"Some fruit '{fruitObj.name}'");

This defeats the purpose of structured logging by putting everything into a string, so this should be discouraged.!

cijothomas avatar Apr 20 '23 15:04 cijothomas

This issue was marked stale due to lack of activity and will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. This bot runs once per day.

github-actions[bot] avatar Oct 15 '24 03:10 github-actions[bot]

Added spec compliance tag. https://github.com/open-telemetry/opentelemetry-specification/pull/3987

cijothomas avatar Jan 25 '25 00:01 cijothomas

Added spec compliance tag. open-telemetry/opentelemetry-specification#3987

@cijothomas based on the specification link you provided, am I correctly understanding that SDKs will perform deduplication of attribute keys unless we explicitly specify not to?

If yes, then does dotnet SDK support this at the moment or is in the process of supporting it soon?

Ayush-Kaushik avatar Feb 13 '25 20:02 Ayush-Kaushik

Yes, SDKs are expected to follow the spec. OTel .NET is not following this part of the spec today OTel .NET should do de-duplication and comply with the spec. It is TBD if this should be done by default or make it opt-in. (The original reason for not doing this is performance, which is still valid, so most likely this'd be opt-in, instead of opt-out to maintain backward compatibility in terms of perf.) - But this is up-to the maintainers to make a call (I am not longer maintainer here.)

As of now, none is actively working on this feature, so no ETA that can be shared. Contributions are welcome.

cijothomas avatar Feb 13 '25 20:02 cijothomas

I ran into this today because when enabling IncludesScopes in an ASP.NET Core app, two scopes add RequestId, which is duplicated on the exported data. This causes issues with the Elasticsearch exporter, which doesn't (currently) account for this.

@alanwest I might try to pick this up shortly and at least play around with potential solutions. Would you agree that this should be opt-in? From a dev-ex point of view, I'd prefer that spec compliance is enabled by default, even if it has a small overhead, with an option to disable it if needed. Given that this occurs in any ASP.NET Core app with IncludeScopes and that it's not obvious when an export fails, defaulting to the spec-compliant behaviour feels cleaner.

I could start by reviewing the code properly to see what the overhead might be for an implementation, and then we could assess from there.

stevejgordon avatar Apr 10 '25 14:04 stevejgordon

I could start by reviewing the code properly to see what the overhead might be for an implementation

@stevejgordon Sounds good! Thus far in this discussion it has been assumed that the overhead will be high, so this sounds like a good place to start. From there we can make a more informed decision for whether the behavior is enabled or disabled by default.

alanwest avatar Apr 10 '25 17:04 alanwest

Even though we don't know the performance impact yet, just my opinion regarding the default behavior:

With missing attribute deduplication, it is very easy to introduce errors unnoticed. Just at some point higher in the call chain e.g. when traces for some API-interface are added -> all the downstream internal service/module logs will be affected!

Wrong logs will lead to wrong assumptions and wasted time (very costly)!

In my opinion, by default the SDK should comply with the specification. Less performant logs are better than possible wrong logs. Nobody expects the current behavior, since it deviates from the specification. Telemetry tools don't know how to handle such logs and may all behave inconsistently.

It should be a conscious choice for a developer to skip deduplication with the drawbacks for performance reasons.

c4brei avatar Apr 11 '25 10:04 c4brei

If we find out that it is really not possible to follow to the specification by default, I think we should try to extend the specification to handle duplicate attributes cases gracefully. So that telemetry tools can all behave correctly.

c4brei avatar Apr 11 '25 10:04 c4brei

Until this gets fixed in the SDK, I've shared a workaround to handle attribute deduplication using the opentelemetry collector transform processor before exporting to data-prepper which doesn't support ingesting duplicate attributes, in case someone hits the same wall as I did :)

https://github.com/opensearch-project/data-prepper/issues/3868#issuecomment-2988475617

samuel-poirier avatar Jun 19 '25 15:06 samuel-poirier

If we find out that it is really not possible to follow to the specification by default, I think we should try to extend the specification to handle duplicate attributes cases gracefully. So that telemetry tools can all behave correctly.

It is definitely possible to follow the specs by default.

cijothomas avatar Jun 19 '25 16:06 cijothomas

Thanks, @cijothomas. My initial thought, without having looked too deeply at the code, was that data from scopes needs to be added to the attributes on the LogRecord before processors run, and at that point, the correct deduplication logic can occur to resolve the final value for each unique attribute name. This would also allow consumers to redact or remove attributes from scope data in a processor, which is currently impossible. At a high level, would that seem reasonable?

stevejgordon avatar Jun 20 '25 09:06 stevejgordon