botocore icon indicating copy to clipboard operation
botocore copied to clipboard

very rare ReferenceError

Open tsah-alike opened this issue 2 years ago • 10 comments

Describe the bug

We are running python 3.8 on AWS lambda. We use boto3. The code is patched by aws-xray-sdk and lumigo tracer. Very rarely (every few months) we encounter a ReferenceError. This will happen again and again as long as the same instance of lambda is reused.

We could not find a way to reproduce it. All we have is the stack trace.

Expected Behavior

Should not raise ReferenceError.

Current Behavior

ReferenceError is raised

This particular one happened during PutItem on a DynamoDB Table object.

[ERROR] ReferenceError: weakly-referenced object no longer exists *** application part of stack trace *** table_handler.update_item( File "/var/runtime/boto3/resources/factory.py", line 580, in do_action response = action(self, *args, **kwargs) File "/var/runtime/boto3/resources/action.py", line 88, in call response = getattr(parent.meta.client, operation_name)(*args, **params) File "/opt/python/botocore/client.py", line 530, in _api_call return self._make_api_call(operation_name, kwargs) File "/opt/python/wrapt/wrappers.py", line 644, in call return self._self_wrapper(self.wrapped, self._self_instance, File "/opt/python/aws_xray_sdk/ext/botocore/patch.py", line 38, in _xray_traced_botocore return xray_recorder.record_subsegment( File "/opt/python/aws_xray_sdk/core/recorder.py", line 462, in record_subsegment six.raise_from(exc, exc) File "", line 3, in raise_from File "/opt/python/aws_xray_sdk/core/recorder.py", line 457, in record_subsegment return_value = wrapped(*args, **kwargs) File "/opt/python/botocore/client.py", line 943, in _make_api_call http, parsed_response = self._make_request( File "/opt/python/botocore/client.py", line 966, in _make_request return self._endpoint.make_request(operation_model, request_dict) File "/opt/python/botocore/endpoint.py", line 119, in make_request return self._send_request(request_dict, operation_model) File "/opt/python/botocore/endpoint.py", line 198, in _send_request request = self.create_request(request_dict, operation_model) File "/opt/python/botocore/endpoint.py", line 134, in create_request self._event_emitter.emit( File "/opt/python/botocore/hooks.py", line 412, in emit return self._emitter.emit(aliased_event_name, **kwargs) File "/opt/python/botocore/hooks.py", line 256, in emit return self._emit(event_name, kwargs) File "/opt/python/botocore/hooks.py", line 239, in _emit response = handler(**kwargs) File "/opt/python/botocore/signers.py", line 105, in handler return self.sign(operation_name, request) File "/opt/python/botocore/signers.py", line 149, in sign signature_version = self._choose_signer( File "/opt/python/botocore/signers.py", line 219, in _choose_signer handler, response = self._event_emitter.emit_until_response(

Reproduction Steps

I'm sorry, we did not manage to reproduce this.

Possible Solution

It seems like the RequestSigner class holds a weak reference to some object, but the case of that object being GCd is not dealt with. to fix, surround the expression in botocore/signers.py", line 219 with a try/catch block, and handle the case of ReferenceError

Additional Information/Context

We are running python 3.8 on AWS lambda. We use the official runtime. We use boto3. The code is patched by aws-xray-sdk and lumigo tracer.

SDK version used

unknown, included with python3.8 AWS Lambda runtime

Environment details (OS name and version, etc.)

python3.8 lambda runtime, intel processor

tsah-alike avatar Jun 15 '23 09:06 tsah-alike

Thanks @tsah-alike for reaching out. Which version of botocore are you using? Can you share any code snippets that resulted in this error?

I think it would be worth opening an issue directly with the aws-xray-sdk-python repository for this.

tim-finnigan avatar Jun 15 '23 21:06 tim-finnigan

Thanks for responding @tim-finnigan, I'll open an issue there as well. The version is unknown since it's coming from the AWS Lambda runtime. My guess is it's pretty recent but not the most recent.

tsah-alike avatar Jun 19 '23 06:06 tsah-alike

We first noticed this bug about a year ago.

tsah-alike avatar Jun 19 '23 06:06 tsah-alike

Hi @tsah-alike thanks for following up. Per the documentation on Lambda runtimes the packaged botocore version would be botocore-1.29.90. And it looks like aws-xray-sdk-python accepts versions going back as far as 1.11.3. You could confirm your version by checking the logs (adding boto3.set_stream_logger('') to your script) or just importing and printing it:

import botocore
print(botocore.__version__)

I'll link the related issue you created in the other repository: https://github.com/aws/aws-xray-sdk-python/issues/394

If you can share any other details such as code snippets or steps to reproduce then that may help narrow down the issue.

tim-finnigan avatar Jun 19 '23 18:06 tim-finnigan

The version is 1.29.156. We couldn't create a minimal working example. The last failure was something like this (simplified):

        config = Config(connect_timeout=1, read_timeout=5, retries={'max_attempts': 3})
        session = boto3.Session()
        resource = session.resource('dynamodb', config=config)
       ... business logic ...
       res = resource.query(KeyConditionExpression=Key('post_id').eq(post_id), IndexName='post_id')
       ... business logic ...
       resource.update_item(
                    Key=key,
                    UpdateExpression='set is_deleted = :is_deleted',
                    ExpressionAttributeValues={':is_deleted': True}
        )
      ^^^ ReferenceError is thrown here

This worked perfectly for months, but once that ReferenceError was thrown, the same lambda failed the exact same way 266 times, even though the botocore session and the DDB resource are recreated each time. Once the lambda instance was replaced, it stopped happening, and it worked fine ever since (last week).

tsah-alike avatar Jun 20 '23 08:06 tsah-alike

I know it's not a lot of information. I did try my best to find steps to reproduce.

tsah-alike avatar Jun 20 '23 08:06 tsah-alike

Encountered same issue implementing https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/LambdaRedis.step2.html and running a Python3.10 lambda. Locally on my mac I do not get this issue unless it is an async function.

StickStack avatar Apr 29 '24 23:04 StickStack

I have encountered this issue also. Also using the steps above for signing requests to auth on ElastiCache.

Digging a tiny bit into the source, I notice this, in the RequestSigner init method:

# We need weakref to prevent leaking memory in Python 2.6 on Linux 2.6
self._event_emitter = weakref.proxy(event_emitter)

Which looks like it could be the culprit (though I suspect there is more subtlety going on here that I am not aware of from my extremely cursory reading).

I am wondering how safe it is, to just replace this weakref with a normal reference, as I am not using python2.6 on Linux 2.6 - I am going to give this a shot and see if it leads to a memory leak in my usecase, and report back.

Hopefully the above can be a jumping off point for a more qualified boto developer to pick this up and have a look what's going on.

l0x avatar Jul 23 '24 12:07 l0x

Encountered same issue implementing https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/LambdaRedis.step2.html and running a Python3.10 lambda. Locally on my mac I do not get this issue unless it is an async function.

I'm using an adapted sample code in Python 3.12 lambda deployed with chalice 1.31.2. I set the redis_client outside the handler and when I try to use it in the deployed environment, I get this error:

ReferenceError: weakly-referenced object no longer exists
Traceback (most recent call last):
  File "/var/task/chalice/app.py", line 1762, in __call__
    return self.handler(event_obj)
  File "/var/task/app.py", line 119, in periodic_task
    some_func1()
  File "/var/task/app.py", line 113, in some_func1
    upsert_elasticache(my_list)
  File "/var/task/app.py", line 105, in upsert_elasticache
    redis_client.set(my_key, my_value)
  File "/var/task/redis/commands/core.py", line 2333, in set
    return self.execute_command("SET", *pieces, **options)
  File "/var/task/redis/client.py", line 545, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/var/task/redis/connection.py", line 1074, in get_connection
    connection.connect()
  File "/var/task/redis/connection.py", line 289, in connect
    self.on_connect()
  File "/var/task/redis/connection.py", line 330, in on_connect
    auth_args = cred_provider.get_credentials()
  File "/var/task/cachetools/__init__.py", line 741, in wrapper
    v = func(*args, **kwargs)
  File "/var/task/app.py", line 52, in get_credentials
    signed_url = self.request_signer.generate_presigned_url(
  File "/var/task/botocore/signers.py", line 349, in generate_presigned_url
    self.sign(
  File "/var/task/botocore/signers.py", line 149, in sign
    signature_version = self._choose_signer(
  File "/var/task/botocore/signers.py", line 231, in _choose_signer
    handler, response = self._event_emitter.emit_until_response(

foo-up avatar Aug 23 '24 07:08 foo-up

We just ran across this, too, also using Redis, also using similar code to here. We are using python3.11, but we still get the same error emitted at the same place (when self._event_emitter is referenced). This is being initialized something like this:

        request_signer = RequestSigner(
            ServiceId("elasticache"),
            session.region_name,
            "elasticache",
            "v4",
            cast(Credentials, session.get_credentials()),
            session.events,
        )

My suspicion is was that the session object (which is boto3.Session) is in fact not explicitly being referenced (perhaps a newer version of boto does this?) and this then could lead to it getting destroyed. The weak reference to session.events isn't enough to keep it alive. I've tried to solve this by holding on to a reference to session, e.g. self.session = session. That seemed to fix the issue for now, at least.

mgmarino avatar Dec 17 '24 18:12 mgmarino

Hello, the error [ERROR] ReferenceError: weakly-referenced object no longer exists occurs when memory has been cleared or freed. In Botocore/Boto3, when a client is created, the reference should exist as long as the client exists. There could be something in your environment or code that is freeing the reference. Is anyone still experiencing this issue? If so, please let us know what you're doing, your environment and setup, and provide a minimal repro for this issue. Thank you.

adev-code avatar Nov 11 '25 02:11 adev-code

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

github-actions[bot] avatar Nov 22 '25 00:11 github-actions[bot]

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

github-actions[bot] avatar Nov 24 '25 21:11 github-actions[bot]