azure-sdk-for-js icon indicating copy to clipboard operation
azure-sdk-for-js copied to clipboard

Structured logs with deeply nested attributes are partially truncated in Application Insights

Open kamaz opened this issue 5 months ago • 11 comments

  • Package Name: @azure/monitor-opentelemetry-exporter
  • Package Version: 1.0.0-beta.32
  • Operating system: linux
  • [x] nodejs
    • version: v20.19.5

Describe the bug

When using OpenTelemetry integration with Azure Functions (Flex Consumption Plan) on Linux, properties under customDimensions are truncated when they contain complex nested objects exceeding 8 KB in size.

Specifically, nested objects inside attributes are cut off mid-structure, resulting in incomplete JSON data being ingested into Application Insights.

This makes it impossible to query or analyse structured log data accurately when large nested objects are present.

To Reproduce

Steps to reproduce the behaviour:

  1. Use Node.js with Azure Functions (Flex Consumption Plan) on Linux.
  2. Integrate @azure/monitor-opentelemetry-exporter with OpenTelemetry for logging.
  3. Emit a log containing nested properties larger than 8 KB, e.g.:
{
 attributes: {
   property: {
     nestedProperty1: { /* exceeds 5KB */ },
     nestedProperty2: { /* exceeds 5KB */ },
     nestedProperty3: { /* exceeds 5KB */ }
   },
   property1: "value"
 }
}
  1. Observe the resulting data in Application Insights under customDimensions.

Expected behaviour

Nested properties should be preserved in full when their value does not exceed 8 KB, or truncation should occur after proper serialisation of the entire nested structure, rather than mid-object.

If truncation is necessary, it should occur on value string, not on intermediate objects, ensuring valid JSON and full property integrity.

Screenshots

Truncated nested properties (nested_message.png):

Image

Working version of log after commenting out properties truncation in logUtils.ts (root_message.png):

Image

Additional context

  • The issue only occurs when objects exceeding 8 KB are nested inside another property (e.g., attributes.property).
  • When the same data is moved to the root level of attributes, the truncation does not occur.
  • Potential cause appears to be in the truncation logic within logUtils.ts, which truncates based on raw byte length rather than serialized object values.
  • Example reproductions:

kamaz avatar Oct 10 '25 09:10 kamaz

@JacksonWeber can you take a look at this issue?

hectorhdzg avatar Oct 10 '25 16:10 hectorhdzg

This is currently an issue across languages. I'll discuss further with ingestion folks.

JacksonWeber avatar Oct 20 '25 19:10 JacksonWeber

@kamaz Because the value we expect to be ingested here is a string, we can't anticipate that this will always be or not be stringified JSON. In order to ensure that data passed in logs as JSON is not truncated in a way that makes it invalid, the best approach would be using a span or log record processor that correctly parses that data to fit within transfer limits and remain valid.

JacksonWeber avatar Oct 28 '25 17:10 JacksonWeber

@JacksonWeber Thanks for the quick response and for clarifying the current behaviour! 🙏

After reviewing OTEL specification I agree that attributes aren’t the right place for complex or structured objects. They’re meant to be lightweight key/value tags for correlation and filtering (similar to spans, traces, and metrics). According to the OpenTelemetry spec, attributes only support primitive types (string, boolean, double, int64) or homogeneous arrays of those primitives, not nested objects or arrays of objects. So the current exporter behaviour that rejects or flattens complex attribute values aligns with the spec.

However, the main limitation lies with the body field. When a structured object is passed as the body, it is currently coerced into a string ([object Object]), which discards structure and effectively prevents using Application Insights for true structured logging and querying. According to the OpenTelemetry log data model, the body is defined as an any type, supporting complex objects such as maps, arrays, and nested values. This is further discussed in OpenTelemetry semantic conventions issue #1651, which clearly distinguishes attributes (light metadata) from body (potentially richer, structured content).

Example message:

Image

Application Insights view:

Image

The current implementation of exporter limits the ability to leverage the full functionality of Application Insights, especially since the service itself can already consume messages up to 64 KB and store additional structured context in customDimensions.

A potential implementation could be:

  • Detect when the body is a complex object,
  • Use a specific field (e.g., message) as the primary log message,
  • And map all remaining properties from the object into customDimensions.

This would align perfectly with the OpenTelemetry specification, maintain backward compatibility, and unlock the structured logging capabilities already available in the Application Insights platform limited by JS SDK.

If it helps, we’d be happy to draft or contribute a PR to implement this behaviour in the SDK and help expedite the change, provided you’re aligned with the proposal.

kamaz avatar Oct 30 '25 12:10 kamaz

The above fix should resolve the message field serialization issue with complex objects. Will be going out in the upcoming release.

JacksonWeber avatar Nov 13 '25 02:11 JacksonWeber

@JacksonWeber thanks for the recent work on this and the PR

After reviewing the changes and validating them against the original problem, I wanted to share some feedback and ask a couple of clarifying questions.

🚩 Summary of remaining problems

  1. Structured data is collapsed into the message field

The PR serialises the entire structured log body into message. While this avoids nested-object truncation, it introduces two problems:

  • Logs are no longer query friendly in Application Insights. Users must re-parse the message field to get back the original structure.
  • Everything must fit inside the effective 32 KB message limit, even though Application Insights supports ~64 KB overall and ~8 KB per custom field.

This effectively reduces usable log capacity by half.

  1. The fix reduces the maximum usable size from 64 KB -> 32 KB

The issue here described nested object truncation because each field caps at ~8 KB, by moving all content into message, logs now hit the 32 KB limit first, making it impossible to utilise the full capacity of Application Insights. This becomes a significant loss our applications that handle:

  • large payload transformations
  • request/response debugging
  • batch-processing diagnostics
  1. Log messages exceeding 32 KB will be truncated by the exporter, which will make them unusable for querying.

💡 Potential approach (with working example)

I created an example of alternative implementation that have the benefits of structured logging while still respecting Application Insights limits:

  • Place human-readable text from structure payload into the Application Insight message column making it easily readable (up to 32 KB)
  • Place addition structured fields from the payload into customDimensions (each up to 8 KB)

This allows logs to use full 64 KB, and scales automatically if App Insights increases limits in the future.

PR example: https://github.com/Azure/azure-sdk-for-js/pull/36645/files

Example log construction: https://github.com/gradientedge/msft-azure-logging/blob/main/ingest-ai-root.js

Example:

logger.info("Processed order", {
  orderId: "12345",
  message: bigMessageString32kb,          // Uses AI message field
  customObject: {
    level1: {
      level2: {
        level3: { largeField: bigChunk8kb }
      }
    }
  }
});

Output example (Application Insights friendly):

{
  "message": "32KB human-readable message…",
  "customDimensions": {
    "orderId": "12345",
    "customObject.level1.level2.level3.largeField": "8KB chunk…"
  }
}

Application Insight view:

Image

This pattern avoids truncation, keeps logs queryable, and uses the full allowed log size.

Questions for clarification

Also I'd really appreciate your thoughts on the following:

  1. Is there a specific reason the split-field approach (message + customDimensions) wasn't considered? From an SDK perspective it seems to best preserve structure and make full use of Application Insights limits and message readability.
  2. Is there an alternative design you would recommend for supporting large structured logs (32–64KB+) in Application Insights?
  3. The current PR permanently caps structured logs at 32KB is that an intentional design decision?

kamaz avatar Nov 17 '25 12:11 kamaz

@kamaz Thanks for your time on the call this week. I've been working with a test app to understand your logging scenario.

It's possible currently to do something like:

logger.emit({
    severityNumber: SeverityNumber.INFO,
    body: "Successful Test",
    attributes: {
        "custom.dependency": {test: "bing.com", thing: "yes", thing2: 123, nested: { a: "b" } },
    }
});

and get structured data output in application insights. Is the reason that this approach doesn't work for your use-case because of the 8kb limit per custom dimension? Would a solution like auto-splitting custom dimensions specifically to avoid truncation be a valid solution for you?

JacksonWeber avatar Nov 27 '25 00:11 JacksonWeber

@JacksonWeber Just to clarify, the size limitation isn't coming from customDimensions or Application Insights itself, but from the SDK (see this line in logUtils.ts).

Could you please clarify what you mean by "autosplitting"?

How would that work in practice, would the exporter create multiple log items for a single large payload, or split properties across customDimensions to avoid truncation?

kamaz avatar Nov 27 '25 15:11 kamaz

@kamaz I did some more investigation here. You're correct that the 8kb limit is just an SDK limitation. We can absolutely remove the 8kb limit to align with whatever maximum size ingestion will accept for custom properties.

Simply removing the SDK size limitation is easier than what I was describing as auto splitting (which would have been slicing structured data into 8kb chunks across multiple custom dimensions).

I don't believe the SDK should be in the business of splitting up customer log records across multiple telemetry items. It sounds like the simplest and most effective "fix" is to expand the accepted size of a custom dimension as much as possible given ingestion limitations so that there's no incentive to place large structured data in the message field. My understanding now is that you were only using message because it had a higher limit of 32k vs. the considerably smaller limit on custom dimensions. Is that correct?

JacksonWeber avatar Dec 02 '25 19:12 JacksonWeber

@JacksonWeber Thanks for the follow-up.

To answer your question, we didn't migrate to using the message field because it still has the 32 KB size limitation. Additionally for larger structured logs message is truncated and makes it not being queryable. Also log messages become no longer human readable in Application Insights.

Removing the 8 KB in SDK would resolve the issue on our side, because it would allow us (and anyone else) to take full advantage of Application Insights log message size.

Happy to test any preview build or provide additional context if needed.

kamaz avatar Dec 04 '25 10:12 kamaz

@kamaz Awesome, ok that makes sense. I'll continue to push for the limit on custom dimensions to be expanded for your use case. As mentioned on our call this work will go hand-in-hand with the migration to typespec to define the size constraints of each field passed to ingestion. I'll keep you updated on this thread as that work progresses!

JacksonWeber avatar Dec 04 '25 22:12 JacksonWeber