bucketstore
bucketstore copied to clipboard
support file-like protocol
It would be great to pickle object directly to S3. Something like this would be helpful.
import bucketstore bucket = bucketstore.get('bucketstore-playground', create=True)
import pickle a = {'hello': 'world'}
with open( bucket['foo11'], 'wb') as handle: pickle.dump(a, handle, protocol=pickle.HIGHEST_PROTOCOL)
i'm going to rename this issue
@kennethreitz I am currently working on this. I think I am either:
- Make
S3Keyinherit from something likeio.IOBase. Not entirely sure how that will work, but I'll play around with it. - Make some limited
__enter__/__exit__methods that will create an in memory stream that will be written to S3 on exit.
I think option 2 will be the best bet, but an in memory stream might not be the best idea for huge files.
Do you have any thoughts?
@eligundry I'm not familiar with IOBase, but if it's a good fit, let's go for it!
What does boto normally do for large files?
@kennethreitz Boto is really flexible with what it will take. In all the Bucketstore examples, we clearly see that strings are just handled. But Boto will work fine with any file-like object that you throw at it.
I have been working on this a bit and have deviated slightly from the proposed syntax in the OP issue. What this is going to look like is:
key = bucket.key('foo')
data = {'hello': 'world'}
with key as fh:
pickle.dump(data, fp, protocol=pickle.HIGHEST_PROTOCOL)
In this example, the key is automatically uploaded when it exits the with block. I have this somewhat working, I'm just trying to quash some Python 3 related Unicode issues.
i like it!
@kennethreitz I have this super close, but would love to get some feedback regarding assumptions of the datatypes that this library works with. I noticed that Boto3 will always return bytes for all fetched operations, even if you set it with a unicode string. Ideally, I would love for file-like operations to work similarly (i.e. you give me a string in a with block to write, cool. oh, you gave me bytes, I'll still work). This works perfectly with io.BytesIO in Python 2, but in Python 3, this very much is not an option.
This test with json.dump is holding this feature up. Because json.dump only works with str and will never produce bytes, when it tries to write to io.BytesIO, it'll error out every time.
At this point, I have a few questions/ideas I'm gonna dump out here:
- Maybe streams aren't a good idea? I think switching to
tempfile.TempFileis a hacky way to solve this, but it would work (though flushing could get messy). - Have you run into an issue like this in any other of your many projects?
- Is assuming
bytesfor all file inputs a fair thing to enforce for Python 3?
Hmmmmm
Requests dealt with a similar issue — and there's a lot of code in place to compensate for it.
assuming bytes for Python3 is sane.
closing this due to inactivity
@inishchith We can keep some issue open which have some discussion else we will be left with no open issues. What say?