aws-otel-lambda
aws-otel-lambda copied to clipboard
[Python] SerializationException for XRay exporter
I have a python 3.8 lambda instrumented with the aws layer, after following the documentation here: https://aws-otel.github.io/docs/getting-started/lambda/lambda-python
- Layer version: aws-otel-python38-ver-1-3-0:1 (contains OpenTelemetry for Python v1.3.0 with Contrib v0.22b0 and ADOT Collector for Lambda v0.11.0)
- The lambda is a scheduled lambda (using AWS eventbridge)
- I have not made any changes to the default collector config provided by the layer:
receivers:
otlp:
protocols:
grpc:
http:
exporters:
logging:
awsxray:
service:
pipelines:
traces:
receivers: [otlp]
exporters: [awsxray]
metrics:
receivers: [otlp]
exporters: [logging]
After running the lambda, the layer initialises successfully, but there is an error with the xray exporter after the function execution completes:
2021-08-02T11:12:09.614Z error exporterhelper/queued_retry.go:245 Exporting failed. Try enabling retry_on_failure config option. {
"kind": "exporter",
"name": "awsxray",
"error": "Permanent error: SerializationException: \n\tstatus code: 400
[...]
2021-08-03T10:53:10.222Z error exporterhelper/queued_retry.go:175 Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures.
{
"kind": "exporter",
"name": "awsxray",
"dropped_items": 2
}
I have not tried the suggestions to survive temporary failures because this error happens consistently.
Is there something missing in my setup or is it a bug somewhere?
Hey @bellmatt !
Thanks for filing this issue and describing your setup. Just to confirm, when you say you didn't modify the collector-config
you mean you expect the config is the same config we have on this repo correct?
I just tried using ARN arn:aws:lambda:us-west-2:901920570463:layer:aws-otel-python38-ver-1-3-0:1
, followed the steps in the docs you linked, did not modify the collector config either, and followed this quick tutorial on how to trigger a Lambda function with AWS Eventbridge. Doing all that my Lambda runs successfully and exports to X-Ray where I can see the trace. When I check the CloudWatch logs for this Lambda function I don't see the error you mentioned.
Is it possible for you to try manually triggering your Lamba and seeing if traces appear in X-Ray that way? If that's not possible, would you be able to describe your Lambda function in more detail? I'm not as suspicious of EventBridge since it only calls the Lambda function without doing much more to modify any OTel setup.
Something else that could be helpful is to post more CloudWatch logs to see if they tell us anything else that is potentially weird in the OTel setup on Lambda?
Thanks!
Hi @NathanielRN! sorry for the very late reply and thanks for taking a look and trying to reproduce!
I've had some more time to retry this after updating to the latest layer (python38-1-5-0), but I'm still seeing the error.
Hopefully these answer your questions:
- Triggering the lambda manually results in the same error
- I do see a trace in Xray, but it's the same one I would get just by enabling active tracing on the lambda
- It's a fairly small python 3.8 lambda that makes several external HTTP calls and puts a file in S3, otherwise very little is different from the sample app in this repo
- I am effectively using the default config from this repo yes, I can verify this in the logs too
And here are all the otel related logs I get in cloudwatch
| timestamp | message
| 1636110267822 | 2021/11/05 11:04:27 [collector] Launching OpenTelemetry Lambda extension, version: v0.1.0 |
| 1636110267822 | 2021-11-05T11:04:27.822Z info service/collector.go:303 Starting otelcol... {"Version": "v0.1.0", "NumCPU": 2} |
| 1636110267822 | 2021-11-05T11:04:27.822Z info service/collector.go:242 Loading configuration... |
| 1636110267825 | 2021-11-05T11:04:27.825Z info service/collector.go:258 Applying configuration... |
| 1636110267829 | 2021-11-05T11:04:27.829Z info builder/exporters_builder.go:264 Exporter was built. {"kind": "exporter", "name": "logging"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/exporters_builder.go:264 Exporter was built. {"kind": "exporter", "name": "awsxray"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/pipelines_builder.go:214 Pipeline was built. {"pipeline_name": "metrics", "pipeline_datatype": "metrics"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/pipelines_builder.go:214 Pipeline was built. {"pipeline_name": "traces", "pipeline_datatype": "traces"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/receivers_builder.go:227 Receiver was built. {"kind": "receiver", "name": "otlp", "datatype": "traces"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/receivers_builder.go:227 Receiver was built. {"kind": "receiver", "name": "otlp", "datatype": "metrics"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info service/service.go:143 Starting extensions... |
| 1636110267833 | 2021-11-05T11:04:27.831Z info service/service.go:188 Starting exporters... |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/exporters_builder.go:93 Exporter is starting... {"kind": "exporter", "name": "awsxray"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/exporters_builder.go:98 Exporter started. {"kind": "exporter", "name": "awsxray"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/exporters_builder.go:93 Exporter is starting... {"kind": "exporter", "name": "logging"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/exporters_builder.go:98 Exporter started. {"kind": "exporter", "name": "logging"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info service/service.go:193 Starting processors... |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/pipelines_builder.go:52 Pipeline is starting... {"pipeline_name": "metrics", "pipeline_datatype": "metrics"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/pipelines_builder.go:63 Pipeline is started. {"pipeline_name": "metrics", "pipeline_datatype": "metrics"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/pipelines_builder.go:52 Pipeline is starting... {"pipeline_name": "traces", "pipeline_datatype": "traces"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/pipelines_builder.go:63 Pipeline is started. {"pipeline_name": "traces", "pipeline_datatype": "traces"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info service/service.go:198 Starting receivers... |
| 1636110267833 | 2021-11-05T11:04:27.831Z info builder/receivers_builder.go:71 Receiver is starting... {"kind": "receiver", "name": "otlp"} |
| 1636110267833 | 2021-11-05T11:04:27.831Z info otlpreceiver/otlp.go:75 Starting GRPC server on endpoint 0.0.0.0:4317 {"kind": "receiver", "name": "otlp"} |
| 1636110267833 | 2021-11-05T11:04:27.832Z info otlpreceiver/otlp.go:137 Setting up a second GRPC listener on legacy endpoint 0.0.0.0:55680 {"kind": "receiver", "name": "otlp"} |
| 1636110267833 | 2021-11-05T11:04:27.832Z info otlpreceiver/otlp.go:75 Starting GRPC server on endpoint 0.0.0.0:55680 {"kind": "receiver", "name": "otlp"} |
| 1636110267833 | 2021-11-05T11:04:27.832Z info otlpreceiver/otlp.go:93 Starting HTTP server on endpoint 0.0.0.0:4318 {"kind": "receiver", "name": "otlp"} |
| 1636110267833 | 2021-11-05T11:04:27.832Z info otlpreceiver/otlp.go:159 Setting up a second HTTP listener on legacy endpoint 0.0.0.0:55681 {"kind": "receiver", "name": "otlp"} |
| 1636110267833 | 2021-11-05T11:04:27.832Z info otlpreceiver/otlp.go:93 Starting HTTP server on endpoint 0.0.0.0:55681 {"kind": "receiver", "name": "otlp"} |
| 1636110267833 | 2021-11-05T11:04:27.832Z info builder/receivers_builder.go:76 Receiver started. {"kind": "receiver", "name": "otlp"} |
| 1636110267833 | 2021-11-05T11:04:27.832Z info service/collector.go:206 Setting up own telemetry... |
| 1636110267835 | 2021-11-05T11:04:27.835Z info service/telemetry.go:99 Serving Prometheus metrics {"address": ":8888", "level": 0, "service.instance.id": "7cbb0866-e130-4cc3-a119-7370cdfe3456"} |
| 1636110267835 | 2021-11-05T11:04:27.835Z info service/collector.go:218 Everything is ready. Begin running and processing data. |
| 1636110267839 | 2021/11/05 11:04:27 Registered extension ID: "5d0c57a4-c58d-4a3f-9d84-f9c0a1d74ee6" |
| 1636110267839 | 2021/11/05 11:04:27 [collector] Register response: { |
| 1636110267839 | "functionName": "fcn-xxxโ, |
| 1636110267839 | "functionVersion": "$LATEST", |
| 1636110267839 | "handler": "app.lambda_handler" |
| 1636110267839 | } |
| 1636110267839 | 2021/11/05 11:04:27 [collector] Waiting for event... |
| 1636110269201 | START RequestId: 9bedd347-e140-4730-96f6-3e5a9a342cde Version: $LATEST |
| 1636110269201 | EXTENSION Name: collector State: Ready Events: [INVOKE,SHUTDOWN] |
| 1636110269203 | 2021/11/05 11:04:29 [collector] Received event: { |
| 1636110269203 | "eventType": "INVOKE", |
| 1636110269203 | "deadlineMs": 1636110509202, |
| 1636110269203 | "requestId": "9bedd347-e140-4730-96f6-3e5a9a342cde", |
| 1636110269203 | "invokedFunctionArn": "xxx", |
| 1636110269203 | "tracing": { |
| 1636110269203 | "type": "X-Amzn-Trace-Id", |
| 1636110269203 | "value": "Root=1-61850fbb-5a15b15d1a7576de62df2228;Parent=11818bad54f5b1cb;Sampled=1" |
| 1636110269203 | } |
| 1636110269203 | } |
| 1636110269203 | 2021/11/05 11:04:29 [collector] Waiting for event... |
[ edit: I have removed the function logging ]
| 1636110283823 | 2021-11-05T11:04:43.822Z error exporterhelper/queued_retry.go:254 Exporting failed. Try enabling retry_on_failure config option. {"kind": "exporter", "name": "awsxray", "error": "Permanent error: SerializationException: \n\tstatus code: 400, request id: ac117941-92be-4543-840d-3f06e6b12211"} |
| 1636110283823 | go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send |
| 1636110283823 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:254 |
| 1636110283823 | go.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send |
| 1636110283823 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:116 |
| 1636110283823 | go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send |
| 1636110283823 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:174 |
| 1636110283823 | go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2 |
| 1636110283823 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:95 |
| 1636110283823 | go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces |
| 1636110283823 | go.opentelemetry.io/[email protected]/consumer/consumerhelper/traces.go:29 |
| 1636110283823 | go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export |
| 1636110283823 | go.opentelemetry.io/[email protected]/receiver/otlpreceiver/internal/trace/otlp.go:60 |
| 1636110283823 | go.opentelemetry.io/collector/model/otlpgrpc.rawTracesServer.Export |
| 1636110283823 | go.opentelemetry.io/collector/[email protected]/otlpgrpc/traces.go:84 |
| 1636110283823 | go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler.func1 |
| 1636110283823 | go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:217 |
| 1636110283823 | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1 |
| 1636110283823 | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:327 |
| 1636110283823 | go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler |
| 1636110283823 | go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:219 |
| 1636110283823 | google.golang.org/grpc.(*Server).processUnaryRPC |
| 1636110283823 | google.golang.org/[email protected]/server.go:1297 |
| 1636110283823 | google.golang.org/grpc.(*Server).handleStream |
| 1636110283823 | google.golang.org/[email protected]/server.go:1626 |
| 1636110283823 | google.golang.org/grpc.(*Server).serveStreams.func1.2 |
| 1636110283823 | google.golang.org/[email protected]/server.go:941 |
| 1636110283823 | 2021-11-05T11:04:43.823Z error exporterhelper/queued_retry.go:176 Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures. {"kind": "exporter", "name": "awsxray", "dropped_items": 2} |
| 1636110283823 | go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send |
| 1636110283823 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:176 |
| 1636110283823 | go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2 |
| 1636110283823 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:95 |
| 1636110283823 | go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces |
| 1636110283823 | go.opentelemetry.io/[email protected]/consumer/consumerhelper/traces.go:29 |
| 1636110283823 | go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export |
| 1636110283823 | go.opentelemetry.io/[email protected]/receiver/otlpreceiver/internal/trace/otlp.go:60 |
| 1636110283823 | go.opentelemetry.io/collector/model/otlpgrpc.rawTracesServer.Export |
| 1636110283823 | go.opentelemetry.io/collector/[email protected]/otlpgrpc/traces.go:84 |
| 1636110283823 | go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler.func1 |
| 1636110283823 | go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:217 |
| 1636110283823 | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1 |
| 1636110283823 | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:327 |
| 1636110283823 | go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler |
| 1636110283823 | go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:219 |
| 1636110283823 | google.golang.org/grpc.(*Server).processUnaryRPC |
| 1636110283823 | google.golang.org/[email protected]/server.go:1297 |
| 1636110283823 | google.golang.org/grpc.(*Server).handleStream |
| 1636110283823 | google.golang.org/[email protected]/server.go:1626 |
| 1636110283823 | google.golang.org/grpc.(*Server).serveStreams.func1.2 |
| 1636110283823 | google.golang.org/[email protected]/server.go:941 |
| 1636110283845 | END RequestId: 9bedd347-e140-4730-96f6-3e5a9a342cde |
| 1636110283845 | REPORT RequestId: 9bedd347-e140-4730-96f6-3e5a9a342cde Duration: 14638.64 ms Billed Duration: 14639 ms Memory Size: 256 MB Max Memory Used: 255 MB Init Duration: 1599.57 ms XRAY TraceId: 1-61850fbb-5a15b15d1a7576de62df2228 SegmentId: 11818bad54f5b1cb Sampled: true |
Is there a way to enable debug logging for aws-otel-lambda that may add more context for the SerializationException?
Thanks for the added context @bellmatt ! I especially appreciate you manually triggering the Lambda so we can eliminate Eventbridge and the scheduling from the equation for now.
It makes sense that you see the traces for just Lambda in X-Ray, as those are sent by the Lambda service independent of OpenTelemetry and so regardless of OTel's success/failure those traces should make it to X-Ray every time.
The logs are also helpful at least to confirm that the problem occurs at the time of the ADOT collector (packaged with the python38-ver-1-5-0:3
layer) exporting the traces, and not with the OTel Python SDK itself.
However, I have never seen this Serialization
error from the collector... You say that:
It's a fairly small python 3.8 lambda that makes several external HTTP calls and puts a file in S3
I'm wondering if somehow the upstream OTel Python instrumentation (bundled in the layer) of your HTTP calls is trying to record a response/request parameter that would throw an exception at the point of being exported by the awsxray:
exporter in the ADOT collector? I'm less suspicious of S3 because you're just "put"-ing and not "get"-ing so less chance for wonky data in the serialization step I would think...
Unfortunately it's not easy to add logging to the Lambda Layers because they're already published with their set configurations... But it would be helpful to see what kind of traces OTel Python is sending to the collector and seeing if anything is suspiciously un-serializable. Since you say you app is not too different from the upstream sample app for OTel Python Lambda Function maybe you can comment out your code and try that sample app code first to confirm the difference is the Python function you are using? If it works with the Sample App you may have discovered a bug in OTel Python Instrumentation or the collector itself that we should fix up there where a user's code cannot be instrumented correctly. At that point we can use your code with a modified layer that has debug logging enable. Otherwise, If you still have the same error then we can double check exactly what we package in the python layer since it wouldn't depend on user code at that point.
If you're able to share what libraries your Lambda function is using and how it is using them that would be helpful! I understand if you can't, and we'll try to work around it in that case. Either way taking the steps above first is worthwhile to determine if we are going down the right path ๐
I'll ping @jefchien and @sethAmazon in here who work on the https://github.com/aws-observability/aws-otel-collector just in case they've seen this error on the collector before.
Hi, I'm also facing this issue with the aws-otel-python38-ver-1-5-0:3
layer with my AWS SQS triggered Lambda functions:
2021-11-17 06:53:56,972 Loading sqs:sqs
2021-11-17 06:53:56,973 Calling sqs:get_queue_url with {'QueueName': 'sqs-myqueue'}
2021-11-17 06:53:57,010 Response: {'QueueUrl': 'https://eu-west-3.queue.amazonaws.com/************/sqs-myqueue', 'ResponseMetadata': {'RequestId': 'd1c8d5cf-399a-5180-8269-8fc190de9b2e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'd1c8d5cf-399a-5180-8269-8fc190de9b2e', 'date': 'Wed, 17 Nov 2021 06:53:57 GMT', 'x-amzn-trace-id': 'Root=1-3e4c0f1e-cf56741be982fb48e29c3777;Parent=ee24374ead4a47d5;Sampled=1', 'content-type': 'text/xml', 'content-length': '340'}, 'RetryAttempts': 0}}
2021-11-17 06:53:57,010 Loading sqs:Queue
EXTENSION Name: collector State: Ready Events: [INVOKE,SHUTDOWN]
2021/11/17 06:53:57 [collector] Received event: {
"eventType": "INVOKE",
"deadlineMs": 1637132637131,
"requestId": "42505b52-6e26-5de6-8ca5-015cc71472f4",
"invokedFunctionArn": "arn:aws:lambda:eu-west-3:/************:function:myfunction",
"tracing": {
"type": "X-Amzn-Trace-Id",
"value": "Root=1-6194a702-5d2440a8f3a49b97335cd62c;Parent=3bb118a61ed3cef6;Sampled=1"
}
}
2021/11/17 06:53:57 [collector] Waiting for event...
2021-11-17 06:53:57,132 New SQS received. Message content: {'kind': 'custom', 'myfunction': 'myparams'}
[SNIP]
2021-11-17T06:53:59.550Z error exporterhelper/queued_retry.go:165 Exporting failed. Try enabling retry_on_failure config option. { "kind": "exporter", "name": "awsxray", "error": "Permanent error: SerializationException: \n\tstatus code: 400, request id: d0d75484-69c0-4aca-8ed3-16ccae02c729" }
--
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:165
go.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:134
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:99
go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2
go.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:113
go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces
go.opentelemetry.io/[email protected]/consumer/consumerhelper/traces.go:29
go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export
go.opentelemetry.io/[email protected]/receiver/otlpreceiver/internal/trace/otlp.go:60
go.opentelemetry.io/collector/model/otlpgrpc.rawTracesServer.Export
go.opentelemetry.io/collector/[email protected]/otlpgrpc/traces.go:102
go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler.func1
go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:217
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:325
go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler
go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:219
google.golang.org/grpc.(*Server).processUnaryRPC
google.golang.org/[email protected]/server.go:1297
google.golang.org/grpc.(*Server).handleStream
google.golang.org/[email protected]/server.go:1626
google.golang.org/grpc.(*Server).serveStreams.func1.2
google.golang.org/[email protected]/server.go:941
2021-11-17T06:53:59.551Z error exporterhelper/queued_retry.go:101 Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures. { "kind": "exporter", "name": "awsxray", "dropped_items": 7 }
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:101
go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2
go.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:113
go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces
go.opentelemetry.io/[email protected]/consumer/consumerhelper/traces.go:29
go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export
go.opentelemetry.io/[email protected]/receiver/otlpreceiver/internal/trace/otlp.go:60
go.opentelemetry.io/collector/model/otlpgrpc.rawTracesServer.Export
go.opentelemetry.io/collector/[email protected]/otlpgrpc/traces.go:102
go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler.func1
go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:217
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:325
go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler
go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:219
google.golang.org/grpc.(*Server).processUnaryRPC
google.golang.org/[email protected]/server.go:1297
google.golang.org/grpc.(*Server).handleStream
google.golang.org/[email protected]/server.go:1626
google.golang.org/grpc.(*Server).serveStreams.func1.2
google.golang.org/[email protected]/server.go:941
END RequestId: 42505b52-6e26-5de6-8ca5-015cc71472f4
REPORT RequestId: 42505b52-6e26-5de6-8ca5-015cc71472f4 Duration: 2441.25 ms Billed Duration: 2442 ms Memory Size: 256 MB Max Memory Used: 159 MB Init Duration: 2773.14 ms XRAY TraceId: 1-6194a702-5d2440a8f3a49b97335cd62c SegmentId: 3bb118a61ed3cef6 Sampled: true
However, API Gateway and manually triggered Lambda do not have the issue.
Hey @pdecat thanks for providing your logs ๐ Unfortunately it looks like the logs only include grpc
code logs and don't have logs from the awsxray
exporter itself which would have the logs we need to see for the Serialization error ๐ Especially hoping that the logs reveal what the collector is trying to serialize.
As I mentioned in the comment above, I think this is more likely some miscommunication between the OTel Python SDK & the ADOT Collector which uses the OTel Collector Contrib - AWS X-Ray Exporter. Either the SDK is not serializing correctly or the Collector is not. The fact that it is SQS where users can send whatever they want makes it easier for there to be Serialization error so that the SDK or Collector are provided with values they did not account for.
It also makes sense that you see the error in SQS but not Eventbridge because the JSON in the Lambda Event is very different depending on if the Lambda was triggered by SQS or if the Lambda was triggered by EventBridge. If we knew the Lambda event that causes the Serialization error with this layer that would be a great help to get us to reproduce the error.
However, as it is right now, this still isn't enough information to figure out why you are getting a Serialization error. Today I created a Lambda triggered from SQS with the aws-otel-python38-ver-1-5-0:3
python layer and it completed successfully.
That's why the best way to get to the bottom of this is to get logs for what the SDK is sending to the collector. We want to see what traces are being sent by the OTel Python SDK before it gets to the collector. Like I said before this isn't necessarily easy with the already published layers, but the Lambda event is a great start. If that's not the issue then we should look into your Lambda code, then configuration, and so on.
If it looks fine then we want to investigate the ADOT Collector exporter code and want to figure out why it can't serialize the span it receives.
Please let me know if that makes sense ๐ Also please let me know if you're able to provide me with your trouble-causing Lambda event, Lambda Function Code, your SQS setup, how you configure your Lambda function (it looks to be set to Active tracing), etc so we can try to reproduce the issue.
@NathanielRN Thanks for the really detailed replies and additional context!
I have got this working with another of my python lambdas that has exactly the same config/layer setup. So I think your theory that this issue relates to the code that is being instrumented may be correct.
I'll look into how I can use a layer with debugging enabled to see what's going on with the function where I get the error
@bellmatt Thatโs so awesome Iโm really glad to hear it!! ๐
Yes please let us know if you can share the code that can reproduce this issue ๐ Even if it is some unique user code, the exporter shouldnโt be throwing an exception or at least it should be a more helpful oneโฆ thanks again!
Hi @NathanielRN, I have enabled the logging exporter in debug logging level:
receivers:
otlp:
protocols:
grpc:
http:
exporters:
logging:
loglevel: debug
awsxray:
service:
pipelines:
traces:
receivers: [otlp]
exporters: [logging, awsxray]
metrics:
receivers: [otlp]
exporters: [logging]
Here are traces with this configuration (slightly redacted but number of characters preserved):
2021/11/19 16:53:52 [collector] Waiting for event...
2021-11-19 16:53:52,551 New SQS received. Message content: {'kind': 'channel', 'channel_name': 'fr-team-test', 'message': ':team: Cloud Custodian report for account `012345678901`:\nmark-unattached-old-ebs-volumes-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-attached-ebs-volumes-marked-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-used-ebs-snapshots-marked-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nmark-unused-old-ebs-snapshots-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nterminate-marked-ec2-instances affected aws.ec2 count:\n- eu-west-3: 0\ndelete-marked-ebs-volumes affected aws.ebs count:\n- eu-west-3: 0\nmark-old-ec2-instances-named-packer-builder-for-termination affected aws.ec2 count:\n- eu-west-3: 0\nunmark-used-amis-marked-for-deletion affected aws.ami count:\n- eu-west-3: 0\nmark-unused-old-amis-for-deregistration affected aws.ami count:\n- eu-west-3: 0\nderegister-marked-amis affected aws.ami count:\n- eu-west-3: 0\ndelete-marked-ebs-snapshots affected aws.ebs-snapshot count:\n- eu-west-3: 0\n'}
2021-11-19 16:53:52,797 Loading s3:s3
2021-11-19 16:53:52,816 Loading s3:Object
2021-11-19 16:53:52,817 Calling s3:get_object with {'Bucket': 's3-preprod-team-slack-bot-global-eu-west-3-012345678901', 'Key': 'cache/slack_channels.json'}
2021-11-19 16:53:53,012 Response: {'ResponseMetadata': {'RequestId': 'K848WGG97HHNX3H3', 'HostId': 'bAsPLmzu76mEWQt1N7wzpQ5vQa/f+llhlo0mW0R6H9Gx02uXWWxIKjyCROfWq5pnFmWk3/lp/94=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'bAsPLmzu76mEWQt1N7wzpQ5vQa/f+llhlo0mW0R6H9Gx02uXWWxIKjyCROfWq5pnFmWk3/lp/94=', 'x-amz-request-id': 'K848WGG97HHNX3H3', 'date': 'Fri, 19 Nov 2021 16:53:53 GMT', 'last-modified': 'Wed, 15 Sep 2021 08:53:26 GMT', 'etag': '"ca4e9884f9fb4f382fa0d9d54630c522"', 'x-amz-version-id': 'rhMyJVvp35YcSFcn6DinSOnpBeyoM8U5', 'accept-ranges': 'bytes', 'content-type': 'binary/octet-stream', 'server': 'AmazonS3', 'content-length': '3338817'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2021, 9, 15, 8, 53, 26, tzinfo=tzutc()), 'ContentLength': 3338817, 'ETag': '"ca4e9884f9fb4f382fa0d9d54630c522"', 'VersionId': 'rhMyJVvp35YcSFcn6DinSOnpBeyoM8U5', 'ContentType': 'binary/octet-stream', 'Metadata': {}, 'Body': <botocore.response.StreamingBody object at 0x7fd18c122bb0>}
2021-11-19 16:53:53,497 Executing post_simple_message with parameter: {'channel_id': '***********', 'message': ':team: Cloud Custodian report for account `012345678901`:\nmark-unattached-old-ebs-volumes-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-attached-ebs-volumes-marked-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-used-ebs-snapshots-marked-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nmark-unused-old-ebs-snapshots-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nterminate-marked-ec2-instances affected aws.ec2 count:\n- eu-west-3: 0\ndelete-marked-ebs-volumes affected aws.ebs count:\n- eu-west-3: 0\nmark-old-ec2-instances-named-packer-builder-for-termination affected aws.ec2 count:\n- eu-west-3: 0\nunmark-used-amis-marked-for-deletion affected aws.ami count:\n- eu-west-3: 0\nmark-unused-old-amis-for-deregistration affected aws.ami count:\n- eu-west-3: 0\nderegister-marked-amis affected aws.ami count:\n- eu-west-3: 0\ndelete-marked-ebs-snapshots affected aws.ebs-snapshot count:\n- eu-west-3: 0\n', 'kind': 'channel', 'channel_name': 'fr-team-test'}
/opt/python/slack_sdk/web/internal_utils.py:271: UserWarning: The `text` argument is missing in the request payload for a chat.postMessage call - It's a best practice to always provide a `text` argument when posting a message. The `text` argument is used in places where content cannot be rendered such as: system push notifications, assistive technology such as screen readers, etc.
warnings.warn(message, UserWarning)
2021-11-19 16:53:53,515 Sending a request - url: https://www.slack.com/api/chat.postMessage, query_params:
{}
, body_params:
{}
, files:
{}
, json_body: {'channel': '***********', 'blocks': [{'type': 'section', 'text': {'type': 'mrkdwn', 'text': ':team: Cloud Custodian report for account `012345678901`:\nmark-unattached-old-ebs-volumes-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-attached-ebs-volumes-marked-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-used-ebs-snapshots-marked-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nmark-unused-old-ebs-snapshots-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nterminate-marked-ec2-instances affected aws.ec2 count:\n- eu-west-3: 0\ndelete-marked-ebs-volumes affected aws.ebs count:\n- eu-west-3: 0\nmark-old-ec2-instances-named-packer-builder-for-termination affected aws.ec2 count:\n- eu-west-3: 0\nunmark-used-amis-marked-for-deletion affected aws.ami count:\n- eu-west-3: 0\nmark-unused-old-amis-for-deregistration affected aws.ami count:\n- eu-west-3: 0\nderegister-marked-amis affected aws.ami count:\n- eu-west-3: 0\ndelete-marked-ebs-snapshots affected aws.ebs-snapshot count:\n- eu-west-3: 0\n'}}]}, headers: {'Content-Type': 'application/json;charset=utf-8', 'Authorization': '(redacted)', 'User-Agent': 'Bolt/1.6.1 Python/3.8.11 slackclient/3.11.2 Linux/4.14.246-198.474.amzn2.x86_64'}
2021-11-19 16:53:53,963 Received the following response - status: 200, headers: {'date': 'Fri, 19 Nov 2021 16:53:53 GMT', 'server': 'Apache', 'access-control-allow-origin': '*', 'referrer-policy': 'no-referrer', 'x-slack-backend': 'r', 'strict-transport-security': 'max-age=31536000; includeSubDomains; preload', 'access-control-allow-headers': 'slack-route, x-slack-version-ts, x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled, x-b3-flags', 'access-control-expose-headers': 'x-slack-req-id, retry-after', 'x-oauth-scopes': 'chat:write,commands,incoming-webhook,app_mentions:read,channels:history,channels:read,groups:history,users:read,groups:read', 'x-accepted-oauth-scopes': 'chat:write', 'expires': 'Mon, 26 Jul 1997 05:00:00 GMT', 'cache-control': 'private, no-cache, no-store, must-revalidate', 'pragma': 'no-cache', 'x-xss-protection': '0', 'x-content-type-options': 'nosniff', 'x-slack-req-id': '7f632e42ebfab3c613c1309e26567ecb', 'vary': 'Accept-Encoding', 'content-type': 'application/json; charset=utf-8', 'x-envoy-upstream-service-time': '349', 'x-backend': 'main_normal main_bedrock_normal_with_overflow main_canary_with_overflow main_bedrock_canary_with_overflow main_control_with_overflow main_bedrock_control_with_overflow', 'x-server': 'slack-www-hhvm-main-iad-8mc7', 'x-slack-shared-secret-outcome': 'no-match', 'via': 'envoy-www-iad-u992, envoy-edge-lhr-selo', 'x-edge-backend': 'envoy-www', 'x-slack-edge-shared-secret-outcome': 'no-match', 'connection': 'close', 'transfer-encoding': 'chunked'}, body:
{
"ok": true,
"channel": "***********",
"ts": "1637340833.026600",
"message": {
"bot_id": "***********",
"type": "message",
"text": "This content can't be displayed.",
"user": "***********",
"ts": "1637340833.026600",
"team": "***********",
"bot_profile": {
"id": "***********",
"app_id": "A024MS7V8RY",
"name": "*****-Test",
"icons": {
"image_36": "https://avatars.slack-edge.com/2021-06-21/*************_8d9b3d67513acf315dde_36.png",
"image_48": "https://avatars.slack-edge.com/2021-06-21/*************_8d9b3d67513acf315dde_48.png",
"image_72": "https://avatars.slack-edge.com/2021-06-21/*************_8d9b3d67513acf315dde_72.png"
},
"deleted": false,
"updated": 1627979390,
"team_id": "***********"
},
"blocks": [
{
"type": "section",
"block_id": "lgSW",
"text": {
"type": "mrkdwn",
"text": ":team: Cloud Custodian report for account `012345678901`:\nmark-unattached-old-ebs-volumes-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-attached-ebs-volumes-marked-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-used-ebs-snapshots-marked-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nmark-unused-old-ebs-snapshots-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nterminate-marked-ec2-instances affected aws.ec2 count:\n- eu-west-3: 0\ndelete-marked-ebs-volumes affected aws.ebs count:\n- eu-west-3: 0\nmark-old-ec2-instances-named-packer-builder-for-termination affected aws.ec2 count:\n- eu-west-3: 0\nunmark-used-amis-marked-for-deletion affected aws.ami count:\n- eu-west-3: 0\nmark-unused-old-amis-for-deregistration affected aws.ami count:\n- eu-west-3: 0\nderegister-marked-amis affected aws.ami count:\n- eu-west-3: 0\ndelete-marked-ebs-snapshots affected aws.ebs-snapshot count:\n- eu-west-3: 0\n",
"verbatim": false
}
}
]
}
}
2021-11-19T16:53:54.015Z INFO loggingexporter/logging_exporter.go:41 TracesExporter
{
"#spans": 5
}
2021-11-19T16:53:54.015Z DEBUG loggingexporter/logging_exporter.go:51 ResourceSpans #0
Resource labels:
-> telemetry.sdk.language: STRING(python)
-> telemetry.sdk.name: STRING(opentelemetry)
-> telemetry.sdk.version: STRING(1.5.0)
-> cloud.region: STRING(eu-west-3)
-> cloud.provider: STRING(aws)
-> faas.name: STRING(team-slack-bot)
-> faas.version: STRING($LATEST)
-> service.name: STRING(team-slack-bot)
InstrumentationLibrarySpans #0
InstrumentationLibrary opentelemetry.instrumentation.botocore 0.24b0
Span #0
Trace ID : 7afe3987293befaea019f397f1c61de1
Parent ID :
ID : 82495e7a7211ec94
Name : secretsmanager
Kind : SPAN_KIND_CLIENT
Start time : 2021-11-19 16:53:52.004638344 +0000 UTC
End time : 2021-11-19 16:53:52.077158974 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> aws.operation: STRING(GetSecretValue)
-> aws.region: STRING(eu-west-3)
-> aws.service: STRING(secretsmanager)
-> aws.request_id: STRING(e411e04e-da75-472f-ba63-e8a26a35313b)
-> retry_attempts: INT(0)
-> http.status_code: INT(200)
Span #1
Trace ID : 7ea2d8472bb17969d5d8724bed11ba60
Parent ID :
ID : 2528edcde60e561f
Name : secretsmanager
Kind : SPAN_KIND_CLIENT
Start time : 2021-11-19 16:53:52.315715612 +0000 UTC
End time : 2021-11-19 16:53:52.39317674 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> aws.operation: STRING(GetSecretValue)
-> aws.region: STRING(eu-west-3)
-> aws.service: STRING(secretsmanager)
-> aws.request_id: STRING(5a947ff6-d90b-4ec1-a4a8-aa84cfd06714)
-> retry_attempts: INT(0)
-> http.status_code: INT(200)
Span #2
Trace ID : bca5ac01b559ce88ffe08d2b04753c20
Parent ID :
ID : 033f72c42403a4b2
Name : sqs
Kind : SPAN_KIND_CLIENT
Start time : 2021-11-19 16:53:52.412566147 +0000 UTC
End time : 2021-11-19 16:53:52.447696825 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> aws.operation: STRING(GetQueueUrl)
-> aws.region: STRING(eu-west-3)
-> aws.service: STRING(sqs)
-> aws.request_id: STRING(cdef0b08-70c7-59b7-a8d1-5119a8273cfa)
-> retry_attempts: INT(0)
-> http.status_code: INT(200)
Span #3
Trace ID : 6197d69d69846f2d792172ae1d92a3a3
Parent ID : 64988892b117aa41
ID : 8ed5326f67f8fd4b
Name : s3
Kind : SPAN_KIND_CLIENT
Start time : 2021-11-19 16:53:52.835015137 +0000 UTC
End time : 2021-11-19 16:53:53.012767137 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> aws.operation: STRING(GetObject)
-> aws.region: STRING(eu-west-3)
-> aws.service: STRING(s3)
-> aws.request_id: STRING(K848WGG97HHNX3H3)
-> retry_attempts: INT(0)
-> http.status_code: INT(200)
InstrumentationLibrarySpans #1
InstrumentationLibrary opentelemetry.instrumentation.aws_lambda 0.16.dev0
Span #0
Trace ID : 6197d69d69846f2d792172ae1d92a3a3
Parent ID : 937e3f012e0b69cb
ID : 64988892b117aa41
Name : *****.main.lambda_handler
Kind : SPAN_KIND_SERVER
Start time : 2021-11-19 16:53:52.55144591 +0000 UTC
End time : 2021-11-19 16:53:53.964310971 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> faas.execution: STRING(bae4419a-1677-5e6f-858a-1b8bcea4b43f)
-> faas.id: STRING(arn:aws:lambda:eu-west-3:012345678901:function:team-slack-bot)
-> faas.name: STRING(team-slack-bot)
-> faas.version: STRING($LATEST)
2021-11-19T16:53:54.228Z error exporterhelper/queued_retry.go:165 Exporting failed. Try enabling retry_on_failure config option.
{
"kind": "exporter",
"name": "awsxray",
"error": "Permanent error: SerializationException: \n\tstatus code: 400, request id: 6181b57b-c12a-4520-8318-1520f2d712cb"
}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:165
go.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:134
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:99
go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2
go.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:113
go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces
go.opentelemetry.io/[email protected]/consumer/consumerhelper/traces.go:29
go.opentelemetry.io/collector/service/internal/fanoutconsumer.traceConsumer.ConsumeTraces
go.opentelemetry.io/[email protected]/service/internal/fanoutconsumer/consumer.go:79
go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export
go.opentelemetry.io/[email protected]/receiver/otlpreceiver/internal/trace/otlp.go:60
go.opentelemetry.io/collector/model/otlpgrpc.rawTracesServer.Export
go.opentelemetry.io/collector/[email protected]/otlpgrpc/traces.go:102
go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler.func1
go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:217
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:325
go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler
go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:219
google.golang.org/grpc.(*Server).processUnaryRPC
google.golang.org/[email protected]/server.go:1297
google.golang.org/grpc.(*Server).handleStream
google.golang.org/[email protected]/server.go:1626
google.golang.org/grpc.(*Server).serveStreams.func1.2
google.golang.org/[email protected]/server.go:941
2021-11-19T16:53:54.228Z error exporterhelper/queued_retry.go:101 Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures.
{
"kind": "exporter",
"name": "awsxray",
"dropped_items": 5
}
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:101
go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2
go.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:113
go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces
go.opentelemetry.io/[email protected]/consumer/consumerhelper/traces.go:29
go.opentelemetry.io/collector/service/internal/fanoutconsumer.traceConsumer.ConsumeTraces
go.opentelemetry.io/[email protected]/service/internal/fanoutconsumer/consumer.go:79
go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export
go.opentelemetry.io/[email protected]/receiver/otlpreceiver/internal/trace/otlp.go:60
go.opentelemetry.io/collector/model/otlpgrpc.rawTracesServer.Export
go.opentelemetry.io/collector/[email protected]/otlpgrpc/traces.go:102
go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler.func1
go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:217
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:325
go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler
go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:219
google.golang.org/grpc.(*Server).processUnaryRPC
google.golang.org/[email protected]/server.go:1297
google.golang.org/grpc.(*Server).handleStream
google.golang.org/[email protected]/server.go:1626
google.golang.org/grpc.(*Server).serveStreams.func1.2
google.golang.org/[email protected]/server.go:941
END RequestId: bae4419a-1677-5e6f-858a-1b8bcea4b43f
REPORT RequestId: bae4419a-1677-5e6f-858a-1b8bcea4b43f Duration: 1680.31 ms Billed Duration: 1681 ms Memory Size: 256 MB Max Memory Used: 174 MB Init Duration: 2652.43 ms
XRAY TraceId: 1-6197d69d-69846f2d792172ae1d92a3a3 SegmentId: 79ecfe4c2fc4e126 Sampled: true
Also, I just noticed that the error does not happen with the very same Lambda function and some other SQS messages:
2021/11/19 16:54:03 [collector] Waiting for event...
2021-11-19 16:54:03,041 New SQS received. Message content: {'kind': 'channel', 'channel_name': 'fr-team-test', 'message': ':team: Cloud Custodian report for account `012345678901`:\nmark-unattached-old-ebs-volumes-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-attached-ebs-volumes-marked-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-used-ebs-snapshots-marked-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nmark-unused-old-ebs-snapshots-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nterminate-marked-ec2-instances affected aws.ec2 count:\n- eu-west-3: 0\ndelete-marked-ebs-volumes affected aws.ebs count:\n- eu-west-3: 0\nmark-old-ec2-instances-named-packer-builder-for-termination affected aws.ec2 count:\n- eu-west-3: 0\nunmark-used-amis-marked-for-deletion affected aws.ami count:\n- eu-west-3: 0\nmark-unused-old-amis-for-deregistration affected aws.ami count:\n- eu-west-3: 0\nderegister-marked-amis affected aws.ami count:\n- eu-west-3: 0\ndelete-marked-ebs-snapshots affected aws.ebs-snapshot count:\n- eu-west-3: 0\n'}
2021-11-19 16:54:03,043 Executing post_simple_message with parameter: {'channel_id': '***********', 'message': ':team: Cloud Custodian report for account `012345678901`:\nmark-unattached-old-ebs-volumes-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-attached-ebs-volumes-marked-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-used-ebs-snapshots-marked-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nmark-unused-old-ebs-snapshots-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nterminate-marked-ec2-instances affected aws.ec2 count:\n- eu-west-3: 0\ndelete-marked-ebs-volumes affected aws.ebs count:\n- eu-west-3: 0\nmark-old-ec2-instances-named-packer-builder-for-termination affected aws.ec2 count:\n- eu-west-3: 0\nunmark-used-amis-marked-for-deletion affected aws.ami count:\n- eu-west-3: 0\nmark-unused-old-amis-for-deregistration affected aws.ami count:\n- eu-west-3: 0\nderegister-marked-amis affected aws.ami count:\n- eu-west-3: 0\ndelete-marked-ebs-snapshots affected aws.ebs-snapshot count:\n- eu-west-3: 0\n', 'kind': 'channel', 'channel_name': 'fr-team-test'}
2021-11-19 16:54:03,043 Sending a request - url: https://www.slack.com/api/chat.postMessage, query_params:
{}
, body_params:
{}
, files:
{}
, json_body: {'channel': '***********', 'blocks': [{'type': 'section', 'text': {'type': 'mrkdwn', 'text': ':team: Cloud Custodian report for account `012345678901`:\nmark-unattached-old-ebs-volumes-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-attached-ebs-volumes-marked-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-used-ebs-snapshots-marked-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nmark-unused-old-ebs-snapshots-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nterminate-marked-ec2-instances affected aws.ec2 count:\n- eu-west-3: 0\ndelete-marked-ebs-volumes affected aws.ebs count:\n- eu-west-3: 0\nmark-old-ec2-instances-named-packer-builder-for-termination affected aws.ec2 count:\n- eu-west-3: 0\nunmark-used-amis-marked-for-deletion affected aws.ami count:\n- eu-west-3: 0\nmark-unused-old-amis-for-deregistration affected aws.ami count:\n- eu-west-3: 0\nderegister-marked-amis affected aws.ami count:\n- eu-west-3: 0\ndelete-marked-ebs-snapshots affected aws.ebs-snapshot count:\n- eu-west-3: 0\n'}}]}, headers: {'Content-Type': 'application/json;charset=utf-8', 'Authorization': '(redacted)', 'User-Agent': 'Bolt/1.6.1 Python/3.8.11 slackclient/3.11.2 Linux/4.14.246-198.474.amzn2.x86_64'}
2021-11-19 16:54:03,408 Received the following response - status: 200, headers: {'date': 'Fri, 19 Nov 2021 16:54:03 GMT', 'server': 'Apache', 'access-control-allow-origin': '*', 'referrer-policy': 'no-referrer', 'x-slack-backend': 'r', 'strict-transport-security': 'max-age=31536000; includeSubDomains; preload', 'access-control-allow-headers': 'slack-route, x-slack-version-ts, x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled, x-b3-flags', 'access-control-expose-headers': 'x-slack-req-id, retry-after', 'x-oauth-scopes': 'chat:write,commands,incoming-webhook,app_mentions:read,channels:history,channels:read,groups:history,users:read,groups:read', 'x-accepted-oauth-scopes': 'chat:write', 'expires': 'Mon, 26 Jul 1997 05:00:00 GMT', 'cache-control': 'private, no-cache, no-store, must-revalidate', 'pragma': 'no-cache', 'x-xss-protection': '0', 'x-content-type-options': 'nosniff', 'x-slack-req-id': 'ca6443fe89944cf739993add6d0fbbe2', 'vary': 'Accept-Encoding', 'content-type': 'application/json; charset=utf-8', 'x-envoy-upstream-service-time': '279', 'x-backend': 'main_normal main_bedrock_normal_with_overflow main_canary_with_overflow main_bedrock_canary_with_overflow main_control_with_overflow main_bedrock_control_with_overflow', 'x-server': 'slack-www-hhvm-main-iad-8gnd', 'x-slack-shared-secret-outcome': 'no-match', 'via': 'envoy-www-iad-v704, envoy-edge-lhr-fg4e', 'x-edge-backend': 'envoy-www', 'x-slack-edge-shared-secret-outcome': 'no-match', 'connection': 'close', 'transfer-encoding': 'chunked'}, body:
{
"ok": true,
"channel": "***********",
"ts": "1637340843.026700",
"message": {
"bot_id": "***********",
"type": "message",
"text": "This content can't be displayed.",
"user": "***********",
"ts": "1637340843.026700",
"team": "***********",
"bot_profile": {
"id": "***********",
"app_id": "***********",
"name": "*****-Test",
"icons": {
"image_36": "https://avatars.slack-edge.com/2021-06-21/*************_8d9b3d67513acf315dde_36.png",
"image_48": "https://avatars.slack-edge.com/2021-06-21/*************_8d9b3d67513acf315dde_48.png",
"image_72": "https://avatars.slack-edge.com/2021-06-21/*************_8d9b3d67513acf315dde_72.png"
},
"deleted": false,
"updated": 1627979390,
"team_id": "***********"
},
"blocks": [
{
"type": "section",
"block_id": "iLVm",
"text": {
"type": "mrkdwn",
"text": ":team: Cloud Custodian report for account `012345678901`:\nmark-unattached-old-ebs-volumes-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-attached-ebs-volumes-marked-for-deletion affected aws.ebs count:\n- eu-west-3: 0\nunmark-used-ebs-snapshots-marked-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nmark-unused-old-ebs-snapshots-for-deletion affected aws.ebs-snapshot count:\n- eu-west-3: 0\nterminate-marked-ec2-instances affected aws.ec2 count:\n- eu-west-3: 0\ndelete-marked-ebs-volumes affected aws.ebs count:\n- eu-west-3: 0\nmark-old-ec2-instances-named-packer-builder-for-termination affected aws.ec2 count:\n- eu-west-3: 0\nunmark-used-amis-marked-for-deletion affected aws.ami count:\n- eu-west-3: 0\nmark-unused-old-amis-for-deregistration affected aws.ami count:\n- eu-west-3: 0\nderegister-marked-amis affected aws.ami count:\n- eu-west-3: 0\ndelete-marked-ebs-snapshots affected aws.ebs-snapshot count:\n- eu-west-3: 0\n",
"verbatim": false
}
}
]
}
}
2021-11-19T16:54:03.415Z INFO loggingexporter/logging_exporter.go:41 TracesExporter
{
"#spans": 1
}
2021-11-19T16:54:03.415Z DEBUG loggingexporter/logging_exporter.go:51 ResourceSpans #0
Resource labels:
-> telemetry.sdk.language: STRING(python)
-> telemetry.sdk.name: STRING(opentelemetry)
-> telemetry.sdk.version: STRING(1.5.0)
-> cloud.region: STRING(eu-west-3)
-> cloud.provider: STRING(aws)
-> faas.name: STRING(team-slack-bot)
-> faas.version: STRING($LATEST)
-> service.name: STRING(team-slack-bot)
InstrumentationLibrarySpans #0
InstrumentationLibrary opentelemetry.instrumentation.aws_lambda 0.16.dev0
Span #0
Trace ID : 6197d6ab06a9913d0910e417499762de
Parent ID : 662168114861cffe
ID : 5591b7433c79b36d
Name : *****.main.lambda_handler
Kind : SPAN_KIND_SERVER
Start time : 2021-11-19 16:54:03.040898845 +0000 UTC
End time : 2021-11-19 16:54:03.40887126 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> faas.execution: STRING(8d6a3ba0-824f-556a-923b-61b15aacbf3a)
-> faas.id: STRING(arn:aws:lambda:eu-west-3:012345678901:function:team-slack-bot)
-> faas.name: STRING(team-slack-bot)
-> faas.version: STRING($LATEST)
END RequestId: 8d6a3ba0-824f-556a-923b-61b15aacbf3a
REPORT RequestId: 8d6a3ba0-824f-556a-923b-61b15aacbf3a Duration: 395.10 ms Billed Duration: 396 ms Memory Size: 256 MB Max Memory Used: 175 MB
XRAY TraceId: 1-6197d6ab-06a9913d0910e417499762de SegmentId: 243432f020d35f04 Sampled: true
https://github.com/aws/aws-xray-daemon/blob/7f9f71b9acc5ad485d328e762180b453ccfb3090/.github/workflows/release-build-publish.yml@pdecat Thanks for providing that!
This is super interesting. However, I'm unable to reproduce the Serialization Error still on my side ๐ I used the same configuration and was able to call the S3
, SQS
, and secretsmanager
services and have them be exported to AWS X-Ray:
It is difficult to say exactly what is causing the Serialization Error but we can try some things.
- It definitely feels like user code is getting a response that is unserializable. Then, when the OTel Python SDK tries to put that in a trace, it makes it to the Collector and the error occurs. When I did a diff of your two logs I noticed that the 2 Lambdas had the same SQS message content according to the
Message content:
log. However, the "bad" call has a call tos3
which has a section that has<botocore.response.StreamingBody object at 0x7fd18c122bb0>
in its response. Maybe that's not allowed? It would be interesting to remove this call to S3 and see if that is the difference in causing the Serialization Error. Similarly, you can extend this trial-and-error method of removing the other calls in the Lambda and find out which one is causing the problem? I know you said it works fine when it is triggered manually vs triggered from SQS but it also looks like you've been able to trigger from SQS and get it working successfully? That's why I believe it is possible it is something in your actually Lambda function code and not in the SQS event? - Note that in my example I call S3 ListBuckets not GetObject, in case that is the problem causing the Serialization Error?
- I noticed that you make an external call to what seems to be a Slack bot? I'm curious why this does not show up as a trace on the
loggingexporter
. Thispython38-ver-1-5-0:3
layer has instrumentation for therequests
library so unless you are usingurllib3
we should see a trace for it in the logging exporter? - In the "bad" example you have 5 traces: 2 from
secretesmanager
, 1 froms3
, 1 fromsqs
and 1 fromaws_lambda
. Well, that's what we assume based on theloggingexporter
output. I wonder if a trace is being created in addition to these traces but theloggingExporter
is also experiencing a SerializationError and just silently failing as opposed to throwing an error like theawsXRayExporter
... That's why while adding debug logging to the Collector is better, it would be best to see exactly what the OTel Python SDK has to send to the Collector to break the Collector. It's hard to do that without the Python Code that reproduces the error... this is why you can take your Lambda code and run it locally using manual instrumentation as explained in our docs and then turn on debug logging on the OTel Python App locally. Then you'll see what gets sent to the Collector before it makes it there! - Otherwise if there are no hidden traces then it's very surprising that the
awsXRayExporter
fails because nothing in theloggingExporter
's output looks particularly troublesome to serialize... In fact it looks like the calls we have regular CI tests for in (calls usingboto3
and theaws_lambda
instrumentation itself)...
Please let me know if you try these ideas out! The most helpful would definitely be configuration that allows us to reproduce the error ๐
I can reproduce the error using function code that does nothing except call S3 GetObject (or PutObject). Whereas if I then change the code to call ListObjects instead on the same bucket, it works and I can see trace data in xray and S3 appears on the service map.
@bellmatt I am able to GetObject from an S3 bucket and see the Service Trace map succeed:
Is what you have in the S3 content non-sensitive data that I could test trying to get myself? Otherwise could you try doing GetObject for something simpler (like a simple text file) and tell me if that works?
Hey @bellmatt @pdecat ! I was working on some stuff on AWS Lambda and I finally came across your error by chance ๐
However, I am thinking it is an issue in the upstream ADOT Collector instead of anything in control of the managed Lambda layer.
https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/6808
Can you both please look at it when you have a chance and let me know if this looks like something you are trying to do with your Lambdas? Create a Trace with a Parent Context that has a non X-Ray compliant format?
If you are, then electing to not use the auto-instrumentation /opt/otel-instrument
script in your AWS_LAMBDA_EXEC_WRAPPER
environment variable and instead manually instrumenting should prevent this issue, at the cost of not getting auto instrumentation or the trace generated by the opentelemetry-instrumentation-aws-lambda
package which instruments a call of you Lambda function.
Will notify our team about the issue above so we can investigate as soon as possible!
The code has been merged and should be available in release 16 of otel. If you would like to test this code I can create a unofficial docker image for you.
This issue is stale because it has been open 90 days with no activity. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled
Sorry it took me so long to get back to you, but this error doesn't occur with the latest aws-otel-lambda layer, so I think this can be closed