s3-encryption
s3-encryption copied to clipboard
s3_encryption reads entire blob into memory when processing encrypted data which can be problematic for some uses
We can avoid this if we add chunking (and file 'handles') here
Issue added by way of gist by @tedder:
@boldfield
I have a need to pick up this particular issue and update s3-encryption to handle streaming data.
In evaluating solutions for this I've also come across the s3-encryption-client but because of it's KMS usage it does not appear appropriate for my use case.
I need to be able to stream data to and from s3 using client side encryption in a manner consistent with amazon's AES256 client side encryption specifications.
Would you be able to point me to the lines of code that would need to be updated in order to support streaming?
I'd be happy to implement the necessary changes and add a CLI to support streaming encryption/decryption.
Thanks for putting this together.
My chunking makes the encryption a streaming process; that could be updated to read/write to S3 as streaming, though S3 doesn't like that too much.
From what I read boto3.s3.client.put_object
does not support streams and so is not appropriate for my particular usecase where the size of my encrypted data may be arbitrarily large. I looked at your implementation and it does not suit my usecase either. You are encrypting the file ahead of time and then writing it to a new file then youse use the .upload_file() boto method.
That is closer as the underlying method supports streaming uploads, but I need to figure out how to handle actual streaming data.
I'm looking into boto3.s3.client.upload_fileobj()
and boto3.s3.client.download_fileobj()
and am trying an implementation in the S3EncryptionClient.put_object()
and S3EncryptionClient.get_object()
methods.
excellent! Tag me if you get it working. I'm only using the tmpfile so I don't have to deal with things like calculating an unknown file size.
So it turns out I was mistaken. put_object
appears to support streaming natively.
See https://github.com/boto/boto3/issues/426
I can also confirm that if I retrieve the encrypted key data using boto3, the body
value is actually a
botocore.response.StreamingBody
and not a raw string. Thus streaming is fundamentally possible here and it is actually just being restricted via the encrypt/decrypt logic on s3-encryption.
I'm reasonably sure all I need to do is update each of those functions to recognize a file object and encrypt/decrypt each block as I stream through the File object.
I'm documenting this as much for my own purposes as for yours. Feel free to ignore me if I am over communicating.
Thanks for picking this up! I've been meaning to get to this for way too long... and I thoroughly appreciate the updates, don't worry about over communication.
It sounds like you've got pretty much everything you need now, right? If not let me know.
As far as a CLI goes, my initial thoughts on that was developing a separate package for that purpose (boldfield/kms-secrets). At this point it's little more than a stand alone script and I haven't had time to make a proper module out of it. My thinking was to keep the cli separate from the library to cut down on overhead for people who only want the library.
For my purposes I have to have a CLI so I can use shell piping and redirects to move data between s3 and back in a restfully encrypted state. I do not believe a CLI will proof an encumberance as I use click
for anything I do. Once I figure out this whole mapping a function to a streaming filelike object problem the CLI will be simple and I'll push up a branch that has both. If you don't like the CLI I'll just run things from a forked repository.
So ultimately I've come around on the CLI thing. I've created my own built on top of s3-encryption that handles encrypting filestreams. There are some issues with the encryption client and handlers that make them not obviously compatible with what I'm trying to do. In my case I ended up sublassing the client and encrypt/decrypt handlers and overwriting a handful of methods, and then I handle file encryption/decryption in an API I built on top of it and then incorporate into a CLI.
I did this because I need to get a functional version of this very soon and putting it all in my own repository will suffice for the moment.
Once I run passed the licensing issues to my CTO I plan to open source the CLI and I'll post a link to it here.
If either of you have feedback of how to incorporate the api I created more tightly into s3-encryption I'd welcome the feedback.
How is the licensing issue and the CTO discussion going?