aws-otel-lambda icon indicating copy to clipboard operation
aws-otel-lambda copied to clipboard

ReadTimeout: HTTPConnectionPool(host='localhost', port=4318)

Open M3gar00 opened this issue 3 years ago • 5 comments

We're seeing an error in multiple Lambdas. The config file is the default (for now), and the permissions are:

x_ray_tracing = {
      effect = "Allow"
      actions = [
        "xray:PutTraceSegments",
        "xray:PutTelemetryRecords",
        "xray:GetSamplingRules",
        "xray:GetSamplingTargets",
        "xray:GetSamplingStatisticSummaries",
      ]
      resources = ["*"]

The body of the error is as follows:

Traceback (most recent call last):
  File "/var/task/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/var/task/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/var/lang/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/var/lang/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/var/lang/lib/python3.9/http/client.py", line 281, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/var/lang/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/task/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/var/task/urllib3/connectionpool.py", line 785, in urlopen
    retries = retries.increment(
  File "/var/task/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/var/task/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/var/task/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/var/task/urllib3/connectionpool.py", line 451, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/var/task/urllib3/connectionpool.py", line 340, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=4318): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/python/opentelemetry/sdk/trace/export/__init__.py", line 358, in _export_batch
    self.span_exporter.export(self.spans_list[:idx])  # type: ignore
  File "/opt/python/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 139, in export
    resp = self._export(serialized_data)
  File "/opt/python/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 110, in _export
    return self._session.post(
  File "/var/task/requests/sessions.py", line 577, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 122, in instrumented_request
    return _instrumented_requests_call(
  File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 152, in _instrumented_requests_call
    return call_wrapped()
  File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 120, in call_wrapped
    return wrapped_request(self, method, url, *args, **kwargs)
  File "/var/task/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 142, in instrumented_send
    return _instrumented_requests_call(
  File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 152, in _instrumented_requests_call
    return call_wrapped()
  File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 140, in call_wrapped
    return wrapped_send(self, request, **kwargs)
  File "/var/task/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/var/task/requests/adapters.py", line 532, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=4318): Read timed out. (read timeout=10)

There doesn't seem to be a common factor in the lambdas that we are seeing fail. One Lambda is making calls to a server we have setup in ECS, and another is making calls to a third party server. Another Lambda making calls to a third party server is running fine.

M3gar00 avatar Jul 01 '22 15:07 M3gar00

Hi @M3gar00,

Could you provide the lambda layer versions you are using? Are you using automatic or manual instrumentation in your lambda? Did you follow any guides/steps for setup that we could use to try to reproduce this issue?

bryan-aguilar avatar Jul 20 '22 23:07 bryan-aguilar

@bryan-aguilar

I'm using the lambda layer outline in the tutorial for setting up the automatic instrumentation. Layer ARN: arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python-amd64-ver-1-11-1:1

Additional information: I made the assumption that, since it was attempting to communicate on a specific port, that I would open up that port through a Security Group addition. This change did not impact the problem at all.

M3gar00 avatar Jul 21 '22 15:07 M3gar00

I am seeing this error too. @M3gar00 did you find a solution?

marc-at avatar Oct 10 '22 17:10 marc-at

@marc-at -- We ended up dropping ADOT from our Lambdas and using the DataDog interface. 😞 One thing we did discover with DD is that we were running into memory issues, and perhaps that was causing this problem? The solution with the DD problem was to increase the maximum memory available to the Lambda.

M3gar00 avatar Oct 11 '22 12:10 M3gar00

@M3gar00 thanks for the response! I was able to get past the issue. With a little support from AWS, we figured out my lambda function was running in a VPC and needed a few updates. I needed to ensure subnet had a NAT gateway for internet access. Alternately, or based on NAT setup, http proxies in my lambda function also helped.

marc-at avatar Oct 11 '22 12:10 marc-at

We should check if there are recommended values for memory in lambda functions that use the ADOT layer.

rapphil avatar Dec 30 '22 21:12 rapphil

@M3gar00 thanks for the response! I was able to get past the issue. With a little support from AWS, we figured out my lambda function was running in a VPC and needed a few updates. I needed to ensure subnet had a NAT gateway for internet access. Alternately, or based on NAT setup, http proxies in my lambda function also helped.

This seems to be a reason for network issue and the solution mentioned in this comment might be appropriate. Please share if you are still having the issue. If possible kindly share the brief details of your network configurations and environment settings excluding any sensitive data. recommending to use the latest Layer ARN: arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python-amd64-ver-1-15-0:2

vasireddy99 avatar Jan 13 '23 05:01 vasireddy99

This issue is stale because it has been open 90 days with no activity. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled

github-actions[bot] avatar Apr 16 '23 20:04 github-actions[bot]

This issue was closed because it has been marked as stale for 30 days with no activity.

github-actions[bot] avatar May 21 '23 20:05 github-actions[bot]

If it helps others, we also saw this issue when running into memory issues. Increasing the memory has fixed this issue.

KimboTodd avatar May 20 '24 17:05 KimboTodd