opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

New component: blob writer span processor

Open michaelsafyan opened this issue 1 year ago • 2 comments
trafficstars

The purpose and use-cases of the new component

The Blob Writer Span Processor takes selected span content and writes them to a large blob storage system.

This component is intended to address a number of concerns:

  • Sensitivity of data: certain data may be necessary to retain for debugging but may not be suitable for access by all oncallers or others with access to general operational data; writing certain attributes to a separate blob storage system may allow for finer-grained, alternative access restrictions to be applied compared with the general ops backend.
  • Size of the data:: some operational backends may have limitations around the size of the data they can receive; sending large attributes to a separate blob storage backend may avoid these limitations.
  • Costs of storage: while most operational data may need to be available quickly to address incidents, certain attributes may be needed to be accessed less frequently and may be suitable for lower cost, long-term storage options.

Motivating Examples:

  • HTTP request/response pairs stored in span attributes (http.request.body.content and http.response.body.content)
  • LLM prompt/response pairs stored in span event attributes ( gen_ai.prompt and gen_ai.completion)

Use Cases Related to the Examples:

  • Additional restrictions around the access are needed beyond that of the general operations solution; writing to a separate blob storage allows additional access controls to be applied. Links to the destination enable the results to be located in a separate backend storage system that provides the necessary checks on access.

  • Full request/responses get used rarely by the oncallers, only when their end user opens a ticket through their support mechanism; writing this data to a separate, low-cost storage system allows the user to save on their ops storage costs.

Example configuration for the component

The configuration consists of a list of ConfigStanzas:

config := LIST[ConfigStanza]

Each config stanza defines how it will handle exactly one type of attribute. The properties of the stanza are:

  • match_attribute_key: (REQUIRED) The exact attribute key to match (e.g. http.request.body.content)
  • match_attribute_only_in: (OPTIONAL) Allows the key to be matched in only a specific part of the signal.
    • Supported values include:
      • SPAN: only look at span-level attributes (not resource, scope, or event attributes)
      • RESOURCE: only look at resource-level attributes (not span, scope, or event attributes)
      • SCOPE: only look at scope-level attributes (not span, resource, or event attributes)
      • EVENT: only look at event-level attributes (not span, resource, or scope attributes)
  • destination_uri: (Required) The pattern to which to write the data.
    • Ex: gs://example-bucket/full-http/request/payloads/${trace_id}/${span_id}.txt
    • Patterns may reference other parts of the signal, including:
      • trace_id
      • span_id
      • resource.attributes
      • span.attributes
      • scope.attributes
    • Keyscan be referenced with dot or bracket notation (e.g. span.attributes.foo or span.attributes[foo]).
  • content_type: (OPTIONAL) Indicates the content type of the attribute (default: AUTO)
    • Options include:
      • AUTO: attempt to infer the content type automatically
      • extract_from: expr: derive it from other information in the signal - Ex: extract_from: span.attributes["http.request.header.content-type"]
      • any literal string (e.g. "application/json"): to use a static value
  • fraction_to_write: (OPTIONAL) Allows down sampling of the payloads. Defaults to 1.0 (i.e. 100%)
  • fraction_written_behavior: (OPTIONAL) Defaults to REPLACE_WITH_REFERENCE.
    • Options include:
      • REPLACE_WITH_REFERENCE: replace the value with a reference to the destination location.
      • KEEP: the write is a copy, but the original data is not altered.
      • DROP: the fact that a write happened will not be recorded in the attribute
  • fraction_not_written_behavior: (Optional) Defaults to DROP.
    • Options include:
      • DROP: remove the attribute in its entirety
      • KEEP: don't modify the original data if this fraction wasn't matched

Here is a full example with the above in mind:

 - match_attribute_key: http.request.body.content
   match_only_in: SPAN
   destination_uri:  "gs://${env.GCS_BUCKET}/${trace_id}/${span_id}/request.json"
   content_type: "application/json"

 - match_attribute_key: http.response.body.content
   match_only_in: SPAN
   destination_uri: "gs://${env.GCS_BUCKET}/${trace_id}/${span_id}/response.json"
   content_type: "application/json"

Telemetry data types supported

Traces

Is this a vendor-specific component?

  • [ ] This is a vendor-specific component
  • [ ] If this is a vendor-specific component, I am a member of the OpenTelemetry organization.
  • [ ] If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

braydonk, michaelsafyan, dashpole

Sponsor (optional)

dashpole

Additional context

No response

michaelsafyan avatar Jun 24 '24 21:06 michaelsafyan

I am willing to potentially sponsor this, but I would would love to see if any others have needed to store very large or sensitive attributes separately. I plan to raise this tomorrow at the SIG meeting.

dashpole avatar Jun 25 '24 14:06 dashpole

I raised this at the SIG meeting today, but this wasn't an issue people on the call had run into before.

dashpole avatar Jun 26 '24 18:06 dashpole

There is some consideration of moving the "larger" genai attributes. https://github.com/open-telemetry/semantic-conventions/pull/483#discussion_r1387522358

dashpole avatar Jul 10 '24 23:07 dashpole

We Langtrace are also interested to test out this span processor as we are also thinking about this problem. We currently have 2 GenAI OTEL instrumentation libraries - python and typescript.

karthikscale3 avatar Jul 11 '24 00:07 karthikscale3

The LLM Semconv WG is considering reporting prompts and completions in event payloads (and breaking them down into individual structured pieces) - https://github.com/open-telemetry/semantic-conventions/pull/980

Still, there is a possibility that prompts/completion messages could be big. There is interest in the community to record generated images, audio, etc for debugging/evaluation purposes.

From general semconv perspective, we don't usually define span attributes that may contain unbounded data (gen_ai.prompt and completion are temporary exceptions), are are likely to recommend events/logs payloads for this.

In this context, it could make sense to also support blob uploads with LogProcessor. See also https://github.com/open-telemetry/semantic-conventions/pull/1217 where a similar concerns have been raised for logs.

lmolkova avatar Jul 11 '24 23:07 lmolkova

In the interests of transparency, I have started related work on this here:

https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_writer_span_processor

I originally started with a "processor", but I'm having doubts regarding whether this functionality is possible with a processor and am now looking into representing it as an "exporter" that wraps another exporter (but perhaps this is incorrect?). In any event, the (very early, not yet complete code) is in development here:

https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_writer_span_processor/exporter/blobattributeexporter

I appreciate the insight that this may shift to a different representation... with that in mind, I am going to try to make this more general. While I will start with span attributes to handle current representations, I will keep the naming general and allow this to grow to address write-aside to blob storage from other signal types and other parts of the signal.

michaelsafyan avatar Jul 12 '24 15:07 michaelsafyan

Quick Status update:

  • Still working on this
  • Current ETA expectation is ~2 weeks to get a working demo

Will give another update in 2 weeks time or when this is working, whichever is sooner.

michaelsafyan avatar Jul 18 '24 15:07 michaelsafyan

Apologies that this is taking longer than expected. I am, however, still working on this.

michaelsafyan avatar Aug 05 '24 08:08 michaelsafyan

The general shape of this is now present and can be found in:

https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_writer_span_processor/connector/blobattributeuploadconnector

I still need to polish this and create end-to-end testing, but there is probably enough here to get early feedback.

Note that while the original scope was intended to focus on spans, the above covers BOTH spans AND span events, given the pivot of the GenAI semantic conventions towards span event attributes.

I also pivoted from hand-rolling the string interpolation, to trying to leverage OTTL to do it:

... this required some hackery in OTTL, though, and am wondering if there is an even cleaner approach than this.

michaelsafyan avatar Aug 12 '24 21:08 michaelsafyan

@michaelsafyan thanks! To catch you up to date, the current semver 1.27.0 is already span events, so this is relevant.

What's a question mark to many is the change to log events. For example, not all backends know what to do with them, and there is some implied indexing. So, I would expect that once this is in, folks will want to transform log events (with span context) back to span events.

Do you feel up to adding a function like interpolateSpanEvent to do that? Something like logEventWithSpanContextToSpanEvent?

codefromthecrypt avatar Aug 13 '24 00:08 codefromthecrypt

@codefromthecrypt can you elaborate on what you mean by folks will want to transform log events (with span context) back to span events. Is that so that separate logs can get processed by this connector?

The way that I'm thinking about this is that blobattributeuploadconnector will be a generic component that enable:

  1. Uploading attribute content to a blob storage destination.
  2. Replacing the original attribute value with a "Foreign Attribute Reference" (see foreignattr.go)

What I have there now targets:

  • span attributes
  • span event attributes

A logical expansion of this logic would be to also handle:

  • log attributes
  • (maybe?) log body

Other types of conversions (such as span events to logs, or logs back into span events) make sense and would be useful, but probably should be considered out of scope for this particular component (and should probably be tracked in a separate issue), though I agree that it is important for different users to decide whether their events data is recorded as events attached to a span or as separate logs (and that a connector is likely to be a good way to implement that).

michaelsafyan avatar Aug 13 '24 13:08 michaelsafyan

@michaelsafyan so the main q about log events was in relation to the genai spec which is about to switch to them. Since this spec is noted in the description, that's why I thought it might be in scope for this change/PR.

What do you think is a better place to move the topic of transform "span events to log events" to? If you don't have a specific idea, I'll open a new issue, just didn't want to duplicate this, if it was in scope.

codefromthecrypt avatar Aug 14 '24 01:08 codefromthecrypt

I think new, separate issues for "Log Events -> Span Event Connector" and "Span Events -> Logs Connector" would make sense.

michaelsafyan avatar Aug 14 '24 17:08 michaelsafyan

cool. I opened https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34695 first, and if I made any mistakes in the description please correct if you have karma to do so, or ask me to, if you don't.

codefromthecrypt avatar Aug 15 '24 02:08 codefromthecrypt

Just providing another update, since it has been a while.

I was out on vacation last week and had other work to catch up on this past week.

I am hoping to resume this work this coming week.

This is still on my plate.

michaelsafyan avatar Aug 30 '24 19:08 michaelsafyan

Quick status update:

  • Believe that the code (for spans and span events) is largely complete, but bugs may turn up as tests are written
  • Iterating on unit tests (traces_test.go).

I am, however, encountering merge conflicts when attempting to sync from upstream ... so this may require some additional work to resolve.

michaelsafyan avatar Sep 06 '24 20:09 michaelsafyan

Status update:

Still working on writing tests.

As per usual, getting progressively from one error to a different kind of error.

Now the errors that I'm getting are related to the string interpolation library which relates to open issue: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34700

I'm also realizing that the data model in https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_writer_span_processor/connector/blobattributeuploadconnector/internal/foreignattr is one that probably requires more input/agreement in OTel SemConv. I will be opening up an issue there shortly to discuss further and to ensure that it won't block up streaming this code when it is done.

michaelsafyan avatar Sep 24 '24 18:09 michaelsafyan

Status update: now have the string interpolation logic in OTTL working.

Next steps:

  • Complete end-to-end integration tests of existing logic
  • Add support for logs and event bodies
  • Start splitting out pieces of this and trying to upstream individual pieces

michaelsafyan avatar Oct 09 '24 17:10 michaelsafyan

Status update:

  • End-to-end integration tests of existing logic now pass

To keep the change from growing out of control and to prevent horrible merge conflicts down the road, I'm thinking about upstreaming parts of this piecemeal and then expanding capabilities rather than trying to include every single signal type from the outset before starting to upstream.

michaelsafyan avatar Oct 10 '24 21:10 michaelsafyan

I'm renaming this from blobattributeuploadconnector to simply blobuploadconnector given that we want to also be able to target event bodies (or sub-paths within them).

A renamed version now exists in this development branch:

  • https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_upload_connector/connector/blobuploadconnector

I'm going to work on getting pieces of this upstreamed and, in parallel, I am going to start a new development branch for adding capabilities related to logs. That work will proceed here:

  • https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_upload_connector_logs

michaelsafyan avatar Oct 14 '24 17:10 michaelsafyan

I will sponsor this component. Thanks @michaelsafyan for working on this!

dashpole avatar Oct 23 '24 20:10 dashpole

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] avatar Jan 01 '25 03:01 github-actions[bot]

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] avatar Mar 21 '25 03:03 github-actions[bot]

This issue has been closed as inactive because it has been stale for 120 days with no activity.

github-actions[bot] avatar May 20 '25 05:05 github-actions[bot]