aws-otel-lambda
aws-otel-lambda copied to clipboard
ADOT Collector Dropping Exports in Lambda Environment
name: Bug report about: Create a report to help us improve title: 'ADOT Collector Dropping Exports in Lambda Environment' labels: bug assignees: ''
Describe the bug A clear and concise description of what the bug is. I have a manually instrumented Java Lambda with the ADOT otel lambda layer. I have the following setup
Java SDK -> ADOT Collector -> OpenSearch exporter -> OpenSearch Ingestion Pipeline
I expect to see all my spans being exported by the collector to the open search ingestion pipeline, but it appears the collector is being shutdown right after the function ends and I have seen around a 30% ratio in which the last span does not get exported.
I get the error
{"level":"error","ts":1711232220.1311545,"caller":"exporterhelper/common.go:49","msg":"Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures.","kind":"exporter","data_type":"traces","name":"otlphttp","dropped_items":1,"error":"request is cancelled or timed out failed to make an HTTP request: Post \"https://opensearch-pipeline-ozauvc3fcrr3dz6we2uophr43u.us-west-2.osis.amazonaws.com/entry-pipeline/v1/traces\": context canceled","stacktrace":"go.opentelemetry.io/collector/exporter/exporterhelper.(*errorLoggingRequestSender).send \tgo.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:49 go.opentelemetry.io/collector/exporter/exporterhelper.(*baseExporter).send \tgo.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:193 go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func1 \tgo.opentelemetry.io/collector/[email protected]/exporterhelper/traces.go:98 go.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces \tgo.opentelemetry.io/collector/[email protected]/traces.go:25 go.opentelemetry.io/collector/internal/fanoutconsumer.(*tracesConsumer).ConsumeTraces \tgo.opentelemetry.io/[email protected]/internal/fanoutconsumer/traces.go:73 go.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces \tgo.opentelemetry.io/collector/[email protected]/traces.go:25 go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export \tgo.opentelemetry.io/collector/receiver/[email protected]/internal/trace/otlp.go:41 go.opentelemetry.io/collector/pdata/ptrace/ptraceotlp.rawTracesServer.Export \tgo.opentelemetry.io/collector/[email protected]/ptrace/ptraceotlp/grpc.go:89 go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler.func1 \tgo.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:310 go.opentelemetry.io/collector/config/configgrpc.(*GRPCServerSettings).toServerOption.enhanceWithClientInformation.func9 \tgo.opentelemetry.io/collector/config/[email protected]/configgrpc.go:396 go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler \tgo.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:312 google.golang.org/grpc.(*Server).processUnaryRPC \tgoogle.golang.org/[email protected]/server.go:1343 google.golang.org/grpc.(*Server).handleStream \tgoogle.golang.org/[email protected]/server.go:1737 google.golang.org/grpc.(*Server).serveStreams.func1.1 \tgoogle.golang.org/[email protected]/server.go:986"}
Steps to reproduce If possible, provide a recipe for reproducing the error.
- Use the collector config attached below
- Create a ARM Java 11 Lambda with the otel latest lambda layer ARN: arn:aws:lambda:us-west-2:901920570463:layer:aws-otel-collector-arm64-ver-0-90-1:1
- Create 1 Trace and 1 span within that trace. The last line of the java function should be span.end()
What did you expect to see? A clear and concise description of what you expected to see. I expect to see all spans being exported to the oltphttp endpoint of the open search ingestion pipeline.
What did you see instead? A clear and concise description of what you saw instead. I saw the last span being dropped.
What version of collector/language SDK version did you use?
Version: (e.g., v0.58.0
, v1.11.0
, etc)
Collector Lambda Layer: arn:aws:lambda:us-west-2:901920570463:layer:aws-otel-collector-arm64-ver-0-90-1:1
What language layer did you use?
Config: (e.g., Java
, Python
, etc)
Java
Cloudwatch Logs log-events-viewer-result.csv
Collector Config
extensions:
sigv4auth:
region: "us-west-2"
service: "osis"
receivers:
otlp:
protocols:
grpc:
endpoint: "localhost:4317"
http:
endpoint: "localhost:4318"
exporters:
logging:
awsxray:
otlphttp:
traces_endpoint: "https://opensearch-pipeline-ozauvc3fcrr3dz6we2uophr43u.us-west-2.osis.amazonaws.com/entry-pipeline/v1/traces"
auth:
authenticator: sigv4auth
compression: none
service:
extensions: [sigv4auth]
pipelines:
traces:
receivers: [otlp]
exporters: [awsxray, otlphttp]
metrics:
receivers: [otlp]
exporters: [logging]
telemetry:
metrics:
address: localhost:8888
Additional context Adding a 1 second sleep before my lambda exits solves the problem, but shouldnt the lambda environment design make sure to flush all spans within the collector before the collector shuts down?
Related to #787, I believe
What you could do it act on the SIGTERM
signal instead of adding a sleep, an example on how to use that can be found here: https://github.com/aws-samples/graceful-shutdown-with-aws-lambda/tree/main/java-demo
ah I see, is there a way to force flush all spans from the collector given the SIGTERM signal? Because from the applications side, it seems that all spans are exported to the collector
This issue is stale because it has been open 90 days with no activity. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled