bucketstore icon indicating copy to clipboard operation
bucketstore copied to clipboard

support file-like protocol

Open shantanuo opened this issue 8 years ago • 11 comments

It would be great to pickle object directly to S3. Something like this would be helpful.

import bucketstore bucket = bucketstore.get('bucketstore-playground', create=True)

import pickle a = {'hello': 'world'}

with open( bucket['foo11'], 'wb') as handle: pickle.dump(a, handle, protocol=pickle.HIGHEST_PROTOCOL)

shantanuo avatar Mar 17 '17 13:03 shantanuo

i'm going to rename this issue

kennethreitz avatar Apr 19 '17 15:04 kennethreitz

@kennethreitz I am currently working on this. I think I am either:

  1. Make S3Key inherit from something like io.IOBase. Not entirely sure how that will work, but I'll play around with it.
  2. Make some limited __enter__/__exit__ methods that will create an in memory stream that will be written to S3 on exit.

I think option 2 will be the best bet, but an in memory stream might not be the best idea for huge files.

Do you have any thoughts?

eligundry avatar Jun 02 '17 20:06 eligundry

@eligundry I'm not familiar with IOBase, but if it's a good fit, let's go for it!

What does boto normally do for large files?

kennethreitz avatar Jun 07 '17 16:06 kennethreitz

@kennethreitz Boto is really flexible with what it will take. In all the Bucketstore examples, we clearly see that strings are just handled. But Boto will work fine with any file-like object that you throw at it.

I have been working on this a bit and have deviated slightly from the proposed syntax in the OP issue. What this is going to look like is:

key = bucket.key('foo')
data = {'hello': 'world'}

with key as fh:
    pickle.dump(data, fp, protocol=pickle.HIGHEST_PROTOCOL)

In this example, the key is automatically uploaded when it exits the with block. I have this somewhat working, I'm just trying to quash some Python 3 related Unicode issues.

eligundry avatar Jun 07 '17 17:06 eligundry

i like it!

kennethreitz avatar Jun 07 '17 21:06 kennethreitz

@kennethreitz I have this super close, but would love to get some feedback regarding assumptions of the datatypes that this library works with. I noticed that Boto3 will always return bytes for all fetched operations, even if you set it with a unicode string. Ideally, I would love for file-like operations to work similarly (i.e. you give me a string in a with block to write, cool. oh, you gave me bytes, I'll still work). This works perfectly with io.BytesIO in Python 2, but in Python 3, this very much is not an option.

This test with json.dump is holding this feature up. Because json.dump only works with str and will never produce bytes, when it tries to write to io.BytesIO, it'll error out every time.

At this point, I have a few questions/ideas I'm gonna dump out here:

  1. Maybe streams aren't a good idea? I think switching to tempfile.TempFile is a hacky way to solve this, but it would work (though flushing could get messy).
  2. Have you run into an issue like this in any other of your many projects?
  3. Is assuming bytes for all file inputs a fair thing to enforce for Python 3?

eligundry avatar Jun 09 '17 20:06 eligundry

Hmmmmm

kennethreitz avatar Jun 15 '17 19:06 kennethreitz

Requests dealt with a similar issue — and there's a lot of code in place to compensate for it.

kennethreitz avatar Jun 15 '17 19:06 kennethreitz

assuming bytes for Python3 is sane.

kennethreitz avatar Jun 15 '17 19:06 kennethreitz

closing this due to inactivity

inishchith avatar Feb 14 '19 14:02 inishchith

@inishchith We can keep some issue open which have some discussion else we will be left with no open issues. What say?

ParthS007 avatar Feb 14 '19 14:02 ParthS007