boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

Make resources pickleable/serializable

Open maxrothman opened this issue 9 years ago • 20 comments

Boto3 resources (e.g. instances, s3 objects, etc.) are not pickleable and have no to_json() method or similar. Therefore, there's currently no way to cache resources retrieved via boto3. This is problematic when retrieving a large number of resources that change infrequently. Even a cache of 30s or so can greatly increase the performance of certain programs and drastically reduce the number of necessary API calls to AWS.

Would it be possible to have some way to serialize resources?

maxrothman avatar Jun 11 '16 03:06 maxrothman

In general, I'd like to improve things with regard to pickling/serializing objects. However, this is going to be challenging to implement giving the dynamic nature of resources/instances.

Marking as a feature request. If anyone has any ideas/suggestions they want to share, feel free to chime in.

jamesls avatar Jun 15 '16 19:06 jamesls

This should be possible by giving the classes a __reduce__ method. See this StackOverflow question and the Python docs for more info.

maxrothman avatar Jun 15 '16 21:06 maxrothman

I might be interested in contributing a patch for this issue if someone could help orient me in the codebase so I can find the callable that generates resources. @jamesls do you have thoughts on potential challenges in making said patch?

maxrothman avatar Jun 19 '16 04:06 maxrothman

@jamesls ping. Any update on this?

maxrothman avatar Jun 29 '16 14:06 maxrothman

@jamesls ping

maxrothman avatar Jul 18 '16 14:07 maxrothman

I'm working on a patch for this. Is there any way currently that given a resource object you can get a reference to the ServiceContext that was used to create it? Or alternatively, is there a way to create a resource object from raw response JSON?

maxrothman avatar Jul 21 '16 17:07 maxrothman

@jamesls any insight on the above question?

maxrothman avatar Aug 03 '16 15:08 maxrothman

Ping. Is there any way I can get some support on this? I've expressed interest in submitting a patch, but I have some questions (above).

maxrothman avatar Sep 30 '16 22:09 maxrothman

Wow, this is old! The factory pattern (as implemented) screws up easy exception handling and pickling. Looks like this one is thrown in the too hard basket.

E       _pickle.PicklingError: Can't pickle <class 'boto3.resources.factory.s3.ObjectSummary'>: attribute lookup s3.ObjectSummary on boto3.resources.factory failed

A work-around that might somehow find it's way into boto3:

# substitute a namedtuple if necessary for py 2.x or earlier 3.x
# https://docs.python.org/3/library/collections.html#collections.namedtuple
@dataclass(frozen=True)
class S3Object:
    """Just the bucket_name and key for an s3.ObjectSummary
    This simple data class should work around problems with Pickle
    for an s3.ObjectSummary, so if obj is an s3.ObjectSummary, then:
    S3Object(bucket=obj.bucket_name, key=obj.key)
    """
    bucket: str
    key: str

bucket_name = 'example'
s3 = boto3.resource('s3')
s3_bucket = s3.Bucket(bucket_name)

objects = (
    S3Object(bucket=obj.bucket_name, key=obj.key)
    for obj in s3_bucket.objects.filter(Prefix='example_prefix')
)

with multiprocessing.Pool() as pool:
    processed_objects = pool.map(YourProcessor, objects)

dazza-codes avatar Feb 28 '19 01:02 dazza-codes

Symptoms of the factory patterns gone wrong?

>>> type(objects)
<class 'boto3.resources.collection.s3.Bucket.objectsCollection'>
>>> isinstance(objects, boto3.resources.collection.s3.Bucket.objectsCollection)
E       AttributeError: module 'boto3.resources.collection' has no attribute 's3'

dazza-codes avatar Feb 28 '19 20:02 dazza-codes

 File "/home/vagrant/home/vagrant/wikirealty/lib/python3.4/site-packages/memoize/__init__.py", line 339, in decorated_function
    timeout=decorated_function.cache_timeout
  File "/home/vagrant/home/vagrant/wikirealty/lib/python3.4/site-packages/memoize/__init__.py", line 82, in set
    self.cache.set(key=key, value=value, timeout=timeout)
  File "/home/vagrant/home/vagrant/wikirealty/lib/python3.4/site-packages/django/core/cache/backends/memcached.py", line 86, in set
    if not self._cache.set(key, value, self.get_backend_timeout(timeout)):
  File "/home/vagrant/home/vagrant/wikirealty/lib/python3.4/site-packages/memcache.py", line 727, in set
    return self._set("set", key, val, time, min_compress_len, noreply)
  File "/home/vagrant/home/vagrant/wikirealty/lib/python3.4/site-packages/memcache.py", line 1055, in _set
    return _unsafe_set()
  File "/home/vagrant/home/vagrant/wikirealty/lib/python3.4/site-packages/memcache.py", line 1030, in _unsafe_set
    store_info = self._val_to_store_info(val, min_compress_len)
  File "/home/vagrant/home/vagrant/wikirealty/lib/python3.4/site-packages/memcache.py", line 994, in _val_to_store_info
    pickler.dump(val)
_pickle.PicklingError: Can't pickle <class 'boto3.resources.factory.s3.Bucket'>: attribute lookup s3.Bucket on boto3.resources.factory failed

I am getting this error using django-memoize with memcached, looks like it is still an issue!

luiscastillocr avatar May 08 '19 22:05 luiscastillocr

A similar issue was resolved in:

  • https://github.com/microsoft/LightGBM/issues/2189
  • http://gael-varoquaux.info/programming/decoration-in-python-done-right-decorating-and-pickling.html

It might help to add a test suite with pickle tests like:

import pickle
import boto3.session
import botocore.session

def test_pickle_botocore_session():
    session = botocore.session.get_session()
    assert pickle.loads(pickle.dumps(session))

def test_pickle_boto3_session():
    session = boto3.session.Session()
    assert pickle.loads(pickle.dumps(session))

Unfortunately they fail:


    def test_pickle_botocore_session():
        session = botocore.session.get_session()
>       assert pickle.loads(pickle.dumps(session))
E       AttributeError: Can't pickle local object '_createenviron.<locals>.encode'

tests/test_clients.py:52: AttributeError
___________________________________________________________________________________ test_pickle_boto3_session ____________________________________________________________________________________

    def test_pickle_boto3_session():
        session = boto3.session.Session()
>       assert pickle.loads(pickle.dumps(session))
E       AttributeError: Can't pickle local object 'lazy_call.<locals>._handler'

dazza-codes avatar Apr 10 '20 20:04 dazza-codes

Any update on this?

SoraDevin avatar Oct 12 '20 05:10 SoraDevin

Any new updates coming this year?

cygniv404 avatar May 20 '21 10:05 cygniv404

This would make it much easier to use the multiprocessing library with boto3. For example, I would like to pass a session object and a list of organization accounts to the Pool.starmap function that calls a function that gets gets the tags on the account and merges them into the existing account objects.

iainelder avatar Aug 17 '21 13:08 iainelder

Would have used this to avoid globals in multiprocessing.

alexandrosandre avatar Sep 17 '21 14:09 alexandrosandre

Is this issue still being looked at? Would greatly appreciate this being added if at all possible.

MrBeeMovie avatar May 11 '22 14:05 MrBeeMovie

Just reminding that this feature would be useful.

Xezed avatar Aug 24 '22 14:08 Xezed

Any update on this?

dhkim0225 avatar Dec 22 '22 05:12 dhkim0225

The boto3 team has recently announced that the Resource interface has entered a feature freeze and won’t be accepting new changes at this time: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html. We’ll be closing existing feature requests, such as this issue, to avoid any confusion on current implementation status. We do appreciate your feedback and will ensure it’s considered in future feature decisions.

We’d like to highlight that all existing code using resources is supported and will continue to work in Boto3. No action is needed from users of the library.

RyanFitzSimmonsAK avatar Jan 18 '23 22:01 RyanFitzSimmonsAK