aws-otel-lambda
aws-otel-lambda copied to clipboard
ReadTimeout: HTTPConnectionPool(host='localhost', port=4318)
We're seeing an error in multiple Lambdas. The config file is the default (for now), and the permissions are:
x_ray_tracing = {
effect = "Allow"
actions = [
"xray:PutTraceSegments",
"xray:PutTelemetryRecords",
"xray:GetSamplingRules",
"xray:GetSamplingTargets",
"xray:GetSamplingStatisticSummaries",
]
resources = ["*"]
The body of the error is as follows:
Traceback (most recent call last):
File "/var/task/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/var/task/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/var/lang/lib/python3.9/http/client.py", line 1377, in getresponse
response.begin()
File "/var/lang/lib/python3.9/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/var/lang/lib/python3.9/http/client.py", line 281, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/var/lang/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/task/requests/adapters.py", line 440, in send
resp = conn.urlopen(
File "/var/task/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/var/task/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/var/task/urllib3/packages/six.py", line 770, in reraise
raise value
File "/var/task/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/var/task/urllib3/connectionpool.py", line 451, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/var/task/urllib3/connectionpool.py", line 340, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=4318): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/python/opentelemetry/sdk/trace/export/__init__.py", line 358, in _export_batch
self.span_exporter.export(self.spans_list[:idx]) # type: ignore
File "/opt/python/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 139, in export
resp = self._export(serialized_data)
File "/opt/python/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 110, in _export
return self._session.post(
File "/var/task/requests/sessions.py", line 577, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 122, in instrumented_request
return _instrumented_requests_call(
File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 152, in _instrumented_requests_call
return call_wrapped()
File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 120, in call_wrapped
return wrapped_request(self, method, url, *args, **kwargs)
File "/var/task/requests/sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 142, in instrumented_send
return _instrumented_requests_call(
File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 152, in _instrumented_requests_call
return call_wrapped()
File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 140, in call_wrapped
return wrapped_send(self, request, **kwargs)
File "/var/task/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/var/task/requests/adapters.py", line 532, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=4318): Read timed out. (read timeout=10)
There doesn't seem to be a common factor in the lambdas that we are seeing fail. One Lambda is making calls to a server we have setup in ECS, and another is making calls to a third party server. Another Lambda making calls to a third party server is running fine.
Hi @M3gar00,
Could you provide the lambda layer versions you are using? Are you using automatic or manual instrumentation in your lambda? Did you follow any guides/steps for setup that we could use to try to reproduce this issue?
@bryan-aguilar
I'm using the lambda layer outline in the tutorial for setting up the automatic instrumentation.
Layer ARN:
arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python-amd64-ver-1-11-1:1
Additional information: I made the assumption that, since it was attempting to communicate on a specific port, that I would open up that port through a Security Group addition. This change did not impact the problem at all.
I am seeing this error too. @M3gar00 did you find a solution?
@marc-at -- We ended up dropping ADOT from our Lambdas and using the DataDog interface. 😞 One thing we did discover with DD is that we were running into memory issues, and perhaps that was causing this problem? The solution with the DD problem was to increase the maximum memory available to the Lambda.
@M3gar00 thanks for the response! I was able to get past the issue. With a little support from AWS, we figured out my lambda function was running in a VPC and needed a few updates. I needed to ensure subnet had a NAT gateway for internet access. Alternately, or based on NAT setup, http proxies in my lambda function also helped.
We should check if there are recommended values for memory in lambda functions that use the ADOT layer.
@M3gar00 thanks for the response! I was able to get past the issue. With a little support from AWS, we figured out my lambda function was running in a VPC and needed a few updates. I needed to ensure subnet had a NAT gateway for internet access. Alternately, or based on NAT setup, http proxies in my lambda function also helped.
This seems to be a reason for network issue and the solution mentioned in this comment might be appropriate. Please share if you are still having the issue. If possible kindly share the brief details of your network configurations and environment settings excluding any sensitive data.
recommending to use the latest Layer ARN: arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python-amd64-ver-1-15-0:2
This issue is stale because it has been open 90 days with no activity. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled
This issue was closed because it has been marked as stale for 30 days with no activity.
If it helps others, we also saw this issue when running into memory issues. Increasing the memory has fixed this issue.