prvt icon indicating copy to clipboard operation
prvt copied to clipboard

Split files into chunks before encrypting to metigate potential attacks on file size

Open andreiled opened this issue 4 years ago • 5 comments

As indicated in this comment, the fact that each file is encrypted as a whole leaves following details open after encryption:

  • exact number of files in the encrypted volume
  • almost exact size of each file (using block ciphers like AES-256 allows to determine file size in whole blocks, but not in bytes)

Suggested solution is splitting each file into chunks of a fixed size, making sure to pad the last chunk with random data (just how last block is padded in block cipher implementations). Additionally, it would be a good idea to upload the chunks to the cloud storage in parallel in somewhat randomized order to hinder sort all chunks by creation date attacks.

As explained by @ItalyPaleAle in this reply, this could be considered an acceptable compromise when considering complexities involved in addressing this.

andreiled avatar Jul 04 '20 20:07 andreiled

Thanks for bringing this issue up.

After your comment, I have updated the Encryption document to explain how this is indeed a potential threat, but how at this moment it's considered by design and its risk is considered manageable.

The solution you're proposing would work indeed. It is something I've considered already, although I do see two issues with that:

First, as you stated yourself, this would require wasting more storage space because of padding.
To limit the amount of wasted space we could make chunks smaller (e.g. 64KB, which is the size of each chunk in the DARE format too), however this would make the second issue below even more critical.

The second issue will be more complex to solve, and it's about the way the index works. At the moment, we maintain an index file which contains the list of all files in the storage (for each file, in version 0.4 we store: the decrypted filename, the encrypted filename, the file type, the creation date). This is currently a single file, and I am aware of the fact that as the repo grows bigger, this becomes larger and larger, and could lead to a variety of issues (not just about performance): to try to reduce the issue, in version 0.4 I migrated the index from JSON to a file encoded with protocol buffers.

If we start adding chunks, we'll almost certainly need to reward the way the index is designed, as it could grow too large to be manageable. Of course, creating an index for each file would defeat the purpose, so that won't happen.

Because the index file needs to be able to be stored in object storage services, whatever format we choose it needs to satisfy four requirements:

  1. It needs to be compact, because clients will be uploading it frequently
  2. It needs to be variable in length, so that by getting the full size of the (encrypted) index, you can't understand how many files are there in the repo. That is: we can't use fixed-size fields. Of course, one will always be able to assume that the larger the index, the more files (or chunks) are stored, but that's the same as looking at the size of the data folder.
  3. It needs to be robust, so if a client crashes half-way while updating the index, there's no risk the file gets corrupted (range requests can be dangerous)

I have not found an effective solution to the problem above. I'm fairly confident that whatever the solution will be, it will need to leverage multiple files (for point 1), so things like a SQLite database won't work.

I am open to suggestions however :)

PS: one small correction here:

almost exact size of each file (using block ciphers like AES-256 allows to determine file size in whole blocks, but not in bytes)

In addition to that, because we use the DARE format there's a small overhead (they claim ~0.05%) as each 64KB chunk has a header. This can be calculated deterministically, however.
The other thing is that prvt itself adds a header, which is of variable length and can be at most 256+1024 bytes.

ItalyPaleAle avatar Jul 04 '20 21:07 ItalyPaleAle

Found another disadvantage of splitting files into chunks after exploring AWS S3 pricing: each PUT object operation costs $0.005, so uploading a 1GB file split into 1MB chunks would cost $5

andreiled avatar Jul 05 '20 02:07 andreiled

Actually that’s the price for 1,000 requests, so uploading that 1GB file would only be $0.005 ;)

ItalyPaleAle avatar Jul 05 '20 02:07 ItalyPaleAle

Oh, didn't notice the per 1,000 requests part 🤦 $0.005 to upload 1GB file looks so much better.

andreiled avatar Jul 05 '20 03:07 andreiled

In any case, if you (or anyone else) have suggestions on how to best implement the index, please feel free to speak up! (And PRe are welcome too)

ItalyPaleAle avatar Jul 05 '20 07:07 ItalyPaleAle