s3-encryption icon indicating copy to clipboard operation
s3-encryption copied to clipboard

s3_encryption reads entire blob into memory when processing encrypted data which can be problematic for some uses

Open boldfield opened this issue 9 years ago • 10 comments

We can avoid this if we add chunking (and file 'handles') here

Issue added by way of gist by @tedder:

boldfield avatar Jan 29 '16 20:01 boldfield

Note I had to change the chunking.

tedder avatar Apr 16 '16 00:04 tedder

@boldfield

I have a need to pick up this particular issue and update s3-encryption to handle streaming data.

In evaluating solutions for this I've also come across the s3-encryption-client but because of it's KMS usage it does not appear appropriate for my use case.

I need to be able to stream data to and from s3 using client side encryption in a manner consistent with amazon's AES256 client side encryption specifications.

Would you be able to point me to the lines of code that would need to be updated in order to support streaming?

I'd be happy to implement the necessary changes and add a CLI to support streaming encryption/decryption.

Thanks for putting this together.

OAuthBringer avatar Nov 14 '16 18:11 OAuthBringer

My chunking makes the encryption a streaming process; that could be updated to read/write to S3 as streaming, though S3 doesn't like that too much.

tedder avatar Nov 14 '16 18:11 tedder

From what I read boto3.s3.client.put_object does not support streams and so is not appropriate for my particular usecase where the size of my encrypted data may be arbitrarily large. I looked at your implementation and it does not suit my usecase either. You are encrypting the file ahead of time and then writing it to a new file then youse use the .upload_file() boto method.

That is closer as the underlying method supports streaming uploads, but I need to figure out how to handle actual streaming data.

I'm looking into boto3.s3.client.upload_fileobj() and boto3.s3.client.download_fileobj() and am trying an implementation in the S3EncryptionClient.put_object() and S3EncryptionClient.get_object() methods.

OAuthBringer avatar Nov 14 '16 19:11 OAuthBringer

excellent! Tag me if you get it working. I'm only using the tmpfile so I don't have to deal with things like calculating an unknown file size.

tedder avatar Nov 14 '16 19:11 tedder

So it turns out I was mistaken. put_object appears to support streaming natively.

See https://github.com/boto/boto3/issues/426

I can also confirm that if I retrieve the encrypted key data using boto3, the body value is actually a botocore.response.StreamingBody and not a raw string. Thus streaming is fundamentally possible here and it is actually just being restricted via the encrypt/decrypt logic on s3-encryption.

I'm reasonably sure all I need to do is update each of those functions to recognize a file object and encrypt/decrypt each block as I stream through the File object.

I'm documenting this as much for my own purposes as for yours. Feel free to ignore me if I am over communicating.

OAuthBringer avatar Nov 14 '16 20:11 OAuthBringer

Thanks for picking this up! I've been meaning to get to this for way too long... and I thoroughly appreciate the updates, don't worry about over communication.

It sounds like you've got pretty much everything you need now, right? If not let me know.

As far as a CLI goes, my initial thoughts on that was developing a separate package for that purpose (boldfield/kms-secrets). At this point it's little more than a stand alone script and I haven't had time to make a proper module out of it. My thinking was to keep the cli separate from the library to cut down on overhead for people who only want the library.

boldfield avatar Nov 15 '16 03:11 boldfield

For my purposes I have to have a CLI so I can use shell piping and redirects to move data between s3 and back in a restfully encrypted state. I do not believe a CLI will proof an encumberance as I use click for anything I do. Once I figure out this whole mapping a function to a streaming filelike object problem the CLI will be simple and I'll push up a branch that has both. If you don't like the CLI I'll just run things from a forked repository.

OAuthBringer avatar Nov 15 '16 15:11 OAuthBringer

So ultimately I've come around on the CLI thing. I've created my own built on top of s3-encryption that handles encrypting filestreams. There are some issues with the encryption client and handlers that make them not obviously compatible with what I'm trying to do. In my case I ended up sublassing the client and encrypt/decrypt handlers and overwriting a handful of methods, and then I handle file encryption/decryption in an API I built on top of it and then incorporate into a CLI.

I did this because I need to get a functional version of this very soon and putting it all in my own repository will suffice for the moment.

Once I run passed the licensing issues to my CTO I plan to open source the CLI and I'll post a link to it here.

If either of you have feedback of how to incorporate the api I created more tightly into s3-encryption I'd welcome the feedback.

OAuthBringer avatar Nov 16 '16 20:11 OAuthBringer

How is the licensing issue and the CTO discussion going?

tolidano avatar Oct 01 '17 03:10 tolidano