opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard
New component: blob writer span processor
The purpose and use-cases of the new component
The Blob Writer Span Processor takes selected span content and writes them to a large blob storage system.
This component is intended to address a number of concerns:
- Sensitivity of data: certain data may be necessary to retain for debugging but may not be suitable for access by all oncallers or others with access to general operational data; writing certain attributes to a separate blob storage system may allow for finer-grained, alternative access restrictions to be applied compared with the general ops backend.
- Size of the data:: some operational backends may have limitations around the size of the data they can receive; sending large attributes to a separate blob storage backend may avoid these limitations.
- Costs of storage: while most operational data may need to be available quickly to address incidents, certain attributes may be needed to be accessed less frequently and may be suitable for lower cost, long-term storage options.
Motivating Examples:
- HTTP request/response pairs stored in span attributes (
http.request.body.content
andhttp.response.body.content
) - LLM prompt/response pairs stored in span event attributes (
gen_ai.prompt
andgen_ai.completion
)
Use Cases Related to the Examples:
-
Additional restrictions around the access are needed beyond that of the general operations solution; writing to a separate blob storage allows additional access controls to be applied. Links to the destination enable the results to be located in a separate backend storage system that provides the necessary checks on access.
-
Full request/responses get used rarely by the oncallers, only when their end user opens a ticket through their support mechanism; writing this data to a separate, low-cost storage system allows the user to save on their ops storage costs.
Example configuration for the component
The configuration consists of a list of ConfigStanza
s:
config := LIST[ConfigStanza]
Each config stanza defines how it will handle exactly one type of attribute. The properties of the stanza are:
-
match_attribute_key: (REQUIRED) The exact attribute key to match (e.g.
http.request.body.content
) -
match_attribute_only_in: (OPTIONAL) Allows the key to be matched in only a specific part of the signal.
- Supported values include:
-
SPAN
: only look at span-level attributes (not resource, scope, or event attributes) -
RESOURCE
: only look at resource-level attributes (not span, scope, or event attributes) -
SCOPE
: only look at scope-level attributes (not span, resource, or event attributes) -
EVENT
: only look at event-level attributes (not span, resource, or scope attributes)
-
- Supported values include:
-
destination_uri: (Required) The pattern to which to write the data.
- Ex:
gs://example-bucket/full-http/request/payloads/${trace_id}/${span_id}.txt
- Patterns may reference other parts of the signal, including:
-
trace_id
-
span_id
-
resource.attributes
-
span.attributes
-
scope.attributes
-
- Keyscan be referenced with dot or bracket notation (e.g.
span.attributes.foo
orspan.attributes[foo]
).
- Ex:
-
content_type: (OPTIONAL) Indicates the content type of the attribute (default:
AUTO
)- Options include:
-
AUTO
: attempt to infer the content type automatically -
extract_from: expr
: derive it from other information in the signal - Ex:extract_from: span.attributes["http.request.header.content-type"]
- any literal string (e.g.
"application/json"
): to use a static value
-
- Options include:
- fraction_to_write: (OPTIONAL) Allows down sampling of the payloads. Defaults to 1.0 (i.e. 100%)
-
fraction_written_behavior: (OPTIONAL) Defaults to
REPLACE_WITH_REFERENCE
.- Options include:
-
REPLACE_WITH_REFERENCE
: replace the value with a reference to the destination location. -
KEEP
: the write is a copy, but the original data is not altered. -
DROP
: the fact that a write happened will not be recorded in the attribute
-
- Options include:
-
fraction_not_written_behavior: (Optional) Defaults to
DROP
.- Options include:
-
DROP
: remove the attribute in its entirety -
KEEP
: don't modify the original data if this fraction wasn't matched
-
- Options include:
Here is a full example with the above in mind:
- match_attribute_key: http.request.body.content
match_only_in: SPAN
destination_uri: "gs://${env.GCS_BUCKET}/${trace_id}/${span_id}/request.json"
content_type: "application/json"
- match_attribute_key: http.response.body.content
match_only_in: SPAN
destination_uri: "gs://${env.GCS_BUCKET}/${trace_id}/${span_id}/response.json"
content_type: "application/json"
Telemetry data types supported
Traces
Is this a vendor-specific component?
- [ ] This is a vendor-specific component
- [ ] If this is a vendor-specific component, I am a member of the OpenTelemetry organization.
- [ ] If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.
Code Owner(s)
braydonk, michaelsafyan, dashpole
Sponsor (optional)
dashpole
Additional context
No response