encfs icon indicating copy to clipboard operation
encfs copied to clipboard

Stream Cipher Used to Encrypt Last File Block

Open lipnitsk opened this issue 11 years ago • 54 comments

From: https://defuse.ca/audits/encfs.htm

Exploitability: Unknown Security Impact: High

As reported in [1], EncFS uses a stream cipher mode to encrypt the last file block. The change log says that the ability to add random bytes to a block was added as a workaround for this issue. However, it does not solve the problem, and is not enabled by default.

EncFS needs to use a block mode to encrypt the last block.

EncFS's stream encryption is unorthodox:

1. Run "Shuffle Bytes" on the plaintext.
    N[J+1] = Xor-Sum(i = 0 TO J) { P[i] }
    (N = "shuffled" plaintext value, P = plaintext)
2. Encrypt with (setIVec(IV), key) using CFB mode.
3. Run "Flip Bytes" on the ciphertext.
    This reverses bytes in 64-byte chunks.
4. Run "Shuffle Bytes" on the ciphertext.
5. Encrypt with (setIVec(IV + 1), key) using CFB mode.

Where setIVec(IV) = HMAC(globalIV || (IV), key), and,
    - 'globalIV' is an IV shared across the entire filesystem.
    - 'key' is the encryption key.

This should be removed and replaced with something more standard. As far as I can see, this provides no useful security benefit, however, it is relied upon to prevent the attacks in [1]. This is security by obscurity.

Edit : [1] may be unavailable, so here it is from archives.org :

[Full-disclosure] Multiple Vulnerabilities in EncFS
From: Micha Riser (micha[at]povworld.org)
Date: Thu Aug 26 2010 - 07:05:18 CDT
(...)
3. Last block with single byte is insecure 
------------------------------------------------------- 
The CFB cipher mode is insecure if it is used twice with the same 
initialization vector. In CFB, the first block of the plain text is XOR-ed with 
the encrypted IV: 
  C0 = P0 XOR Ek (IV ) 
Therefore, for two cipher blocks C0 and C0' encrypted with the same IV, it 
holds that: 
  C0 XOR C0' = (P0 XOR Ek (IV )) XOR (P0' XOR Ek (IV )) = P0 XOR P0' 
This means that an attacker gets the XOR of the two plain texts. EncFs uses a 
modified version of CFB which additionally shuffles and reverses bytes. It is not 
clear however, if the modifications generally help against this problem. 

A security problem arises definitely if the last block contains only a single 
byte and an attacker has two versions of the last block. Operating on a single 
byte, the shuffle and reverse operation do nothing. What remains is a double 
encryption with CFB and XOR-ing the two cipher bytes gives the XOR of the 
two plain text bytes due to the reason described above. Encrypting the last 
block with a stream cipher instead of a block cipher saves at most 16 bytes 
(one cipher block). We think it would be better to sacrifice these bytes and in 
exchange rely only on a single encryption mode for all blocks which simplifies 
both the crypto analysis and the implementation.

lipnitsk avatar Aug 26 '14 06:08 lipnitsk

Plan is to eliminate use of stream mode entirely in Encfs 2.x (for new filesystems). No plan for Encfs 1.x

vgough avatar Aug 29 '14 03:08 vgough

Do you already have a plan for what mode to use? CBC with ciphertext stealing seems to be a good option.

rfjakob avatar Oct 18 '14 09:10 rfjakob

The other option would be to go with CTR for the whole file. With CTR, however, an attacker can flip single bits at will, so it would need to go with MAC enabled by default. If ecryptfs has MACs enabled by default (will check) we should probably too, anyway.

rfjakob avatar Oct 18 '14 09:10 rfjakob

CTR has the additional problem that the XOR of two cipertext files copied at two different times is the XOR of the plaintext. To fix that leak you'd need random per-block IVs.

rfjakob avatar Oct 19 '14 09:10 rfjakob

For Encfs2, I'm leaning towards GCM mode (as used in ZFS).

vgough avatar Oct 23 '14 06:10 vgough

@vgough Salsa20+Poly1305 would also be a viable (and very fast) alternative, as outlined by Thomas Ptacek in his blog: http://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/

generalmanager avatar Mar 01 '15 15:03 generalmanager

Actually, i don't think large changes like that are neccessary. Blockwise cbc works fine for everything but the last 16 bytes (the aes block size). By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.

rfjakob avatar Mar 01 '15 18:03 rfjakob

Please don't invent a padding scheme; just pad with PKCS#7 like everyone else. :)

lachesis avatar Mar 21 '15 04:03 lachesis

Thanks for the pointer! However, pkcs#7 seems to require that you read the last bytes of the ciphertext to geht the plaintext length. This is one additional seek for every stat(), we should really avoid that as it kills rsync performance.

rfjakob avatar Mar 21 '15 08:03 rfjakob

(It's probably more than one seek, because the filesystem has to parse its internal data structures first to locate the data) So I think what we need is a "headerless" scheme, where you don't have to read any ciphertext to get the length. Unconditionally adding 16 zero bytes (or any value) would to that:

pppppppppp 0000000000000000
                    ^---- 16 bytes zero padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 0000000000
                     ^--- 10 bytes of zeros
     ^------------------- 16 bytes encrypted data

rfjakob avatar Mar 21 '15 09:03 rfjakob

Isn't that a security issue if you know that the last bytes will be (padded with) zero bytes? Maybe better random bytes?

djtm avatar May 14 '15 07:05 djtm

Nope, should be fine.

http://en.m.wikipedia.org/wiki/Known-plaintext_attack Modern ciphers such as Advanced Encryption Standard are not currently known to be susceptible to known-plaintext attacks.

rfjakob avatar May 14 '15 21:05 rfjakob

While the current modes of modern ciphers available to encfs might not currently be susceptible to known-plaintext attacks, these types of attacks are typical for cryptanalysis and so this assumption could change after further years of research.

Additionally, encfs offers multiple cipher options. Is this statement true for all ciphers encfs makes available through OpenSSL?

If given two choices for this implementation, are there impacts in choosing one over the other?

  1. Pad with zeros
  2. Pad with any value

RogerThiede avatar May 14 '15 22:05 RogerThiede

Trying to predict how to modify ciphers based on what vulnerabilities might be discovered in the future quickly becomes a wild goose chase. I suspect if you submitted a PR that improved the padding without affecting backwards compat, it would fare better.

akerl avatar May 14 '15 22:05 akerl

A random idea I just thought of: Encode file length (and other small useful metadata) in the encrypted filename. That would reduce the maximum filename length even more than it is now, so if that maximum is reached, substitute a hash of the filename and add the real file name to the end of the file data. That would encode metadata in the file contents only in the (rare) case where the filename is too long, so it wouldn't hurt rsync et al in the common case. And this would resolve the limited filename length problem as well.

JanKanis avatar Jul 21 '15 13:07 JanKanis

In order to make lookups simple, it is preferable that encrypted filenames can be directly computed from plaintext filenames. That way a call to open("foo.txt") doesn't require a directory scan in order to find the encrypted file. Instead, we encrypt "foo.txt" and attempt to open the encrypted name.

Allowing hashed names, to extend allowable file lengths, doesn't hurt too badly since it could still be done without a directory traversal. Encoding metadata into filenames would thwart this, since I'm not aware of any portable way to do a prefix match or otherwise avoid walking the entire directory listing.

vgough avatar Jul 24 '15 03:07 vgough

Of course. I should have thought it through a bit longer.

JanKanis avatar Jul 24 '15 09:07 JanKanis

No worries, I appreciate the ideas. I've wanted to do the same myself, just didn't figure out a way to make that work.

vgough avatar Jul 24 '15 10:07 vgough

Is there a chance that there - maybe ;-) - will be a solution for the actual version in next time?

wasgehetdichdasan avatar Aug 02 '15 17:08 wasgehetdichdasan

no one who thinks that he can make a fast fix?

wasgehetdichdasan avatar Aug 13 '15 21:08 wasgehetdichdasan

Well, this is an incompatible format change, there is no fast fix i'm afraid

rfjakob avatar Aug 14 '15 12:08 rfjakob

uhh. And what's with an not backwards compatible version which is not 2.0?

wasgehetdichdasan avatar Aug 18 '15 14:08 wasgehetdichdasan

However, it does not solve the problem, and is not enabled by default.

could you please clarify which commit introduced the fix and which option is used to workaround this issue?

ping @vgough

h0nIg avatar Oct 24 '16 18:10 h0nIg

could you please clarify which commit introduced the fix and which option is used to workaround this issue?

When you configure encfs in expert mode :

Add random bytes to each block header?
This adds a performance penalty, but ensures that blocks
have different authentication codes.  Note that you can
have the same benefits by enabling per-file initialization
vectors, which does not come with as great of performance
penalty. 
Select a number of bytes, from 0 (no random bytes) to 8: 

However, as the audit stated, it does not solve the problem.

benrubson avatar Mar 10 '17 09:03 benrubson

Any ideas on what data format shall be implemented?

A disk format based on GCM mode could also help to fix the issues related to MAC headers.

danim7 avatar Mar 13 '17 21:03 danim7

This is the most "important" EncFS security report.

By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.

I like the idea @rfjakob. Rather simple, without changing all existing work. Could be a transitional change, before moving to another whole format. Did you perhaps already / would you work on a patch for this please ?

We would then kill two birds with one stone, as we would then be able to also close #10 👍

benrubson avatar Apr 28 '18 10:04 benrubson

No, sorry, no plans of working on this. People who don't mind a format change can move to gocryptfs IMO.

rfjakob avatar Apr 28 '18 11:04 rfjakob

I'll see if I can work on this later on :) I'm also wondering if such a modification would be reverse-write compatible.

By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.

Actually I think that padding with 15 bytes should be OK ?

benrubson avatar Apr 28 '18 12:04 benrubson

Right now I don't see why reverse-write would be a problem.

And yes, actually, 15 bytes should be enough.

rfjakob avatar Apr 29 '18 20:04 rfjakob

I then tried to implement the cipherBlockSize - 1 padding (15 bytes in examples below), according to what you described above @rfjakob, adding 15 bytes at the end of each file (but 0 byte files).

Leading to :

pppppppppp 000000000000000
                    ^---- 15 bytes zero padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 000000000
                     ^---  9 bytes of zeros
     ^------------------- 16 bytes encrypted data

So, I have a working algorithm in the following situations :

  • normal mode : read and write ;
  • reverse mode : read ~~and write~~.

But reverse write seems impossible to achieve correctly. Below are some complicated situations. Let's assume block size is 4KB bytes.

  • We receive a 4KB write request which comes at the end of the existing backing file. It could then be the last block of the file, so we would have to crop the last 15 bytes after decryption. But we are not sure this is the last block to be written, so we are not sure we should crop...
  • We receive a 1020 bytes request which comes at the end of the existing backing file. If we assume this is the last block of the file, we should then crop 12 bytes, decrypt, crop 3 bytes and append. But are we sure this is the last block ? Perhaps calling application will come with another write call to complete the 1020 bytes already received...
  • We receive a 10 bytes write request which comes at the end of the existing backing file. Is it some padding ? Should we then crop 5 bytes of the previous block ? Once again here we are not sure...

benrubson avatar May 03 '18 12:05 benrubson

I think I will go with OneAndZeroes padding of each block, with a cipherBlockSize - 1 bytes padding for the last block. We would then still be able to get size of files without having to read the last block. We would also be able to properly reverse-write, at a cost of one byte per blockSize. I think it's worth it.

Any thoughts ?

Thx 👍

Last block :

pppppppppp 100000000000000
                    ^---- 15 bytes OneAndZeroes padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 000000000
                     ^---  9 bytes of zeros
     ^------------------- 16 bytes encrypted data

Other blocks :

ppppppppppppppp 1
                ^--------  1 byte OneAndZeroes padding
    ^-------------------- 15 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc
     ^------------------- 16 bytes encrypted data

benrubson avatar May 05 '18 14:05 benrubson

Let's look at the difficulties, I think this should all work:

But we are not sure this is the last block to be written, so we are not sure we should crop...

Yes, we have to stat() the file to find out.

But are we sure this is the last block ? Perhaps calling application will come with another write call to complete the 1020 bytes already received...

Again, we can stat() the file to determine if it is the last block. Forward mode has to do this as well, right?

rfjakob avatar May 05 '18 15:05 rfjakob

Another note:

Perhaps calling application will come with another write call to complete the 1020 bytes already received...

This does not matter. In forward mode, the file has to be always consistent on disk. The user application may crash at any time and stop writing. But the data it has already written must be safe.

rfjakob avatar May 05 '18 15:05 rfjakob

Thx for your feedbacks @rfjakob 👍

I agree if the cipher file is fully available locally. You may be in a situation where the cipher file would not be locally available, so you would not be able to stat() it (so you would not be able to know if the block you have been asked to write is the last one of the file). Think about for example downloading (or syncing, whatever the method used) some remote cipher files directly into a reverse-mounted EncFS.

benrubson avatar May 05 '18 15:05 benrubson

Forward mode would not work either in this case, right?

rfjakob avatar May 05 '18 15:05 rfjakob

It would, because then here you encode data, so you don't expect it to be a multiple of cipherBlockSize. If the block you are writing is at the end of the local (cipher) file, you assume this is the last block and compute a cipherBlockSize - 1 bytes padding.

benrubson avatar May 05 '18 15:05 benrubson

I agree if the cipher file is fully available locally.

Can't we stat() the plaintext file instead?

rfjakob avatar May 05 '18 15:05 rfjakob

Unfortunately this would not help. Let's assume we receive a 4KB (blockSize) cipher block. According to the write call received, we have to write it as the end of the plaintext file. Perfect. It could then be the last block of the plain file. But how to be sure ? How can we then remove the last padding bytes that may exist ? Without padding every block as proposed above, I don't see :|

benrubson avatar May 05 '18 15:05 benrubson

If the write expanded the file, if must be the last block, and it must have padding

rfjakob avatar May 05 '18 15:05 rfjakob

(otherwise forward mode is buggy)

rfjakob avatar May 05 '18 15:05 rfjakob

Not necessarily. Think about a cipher file being dowloaded directly into a reverse-write EncFS (so that it is written decrypted directly to the local disk). Every block received and written will expand the plain file. But only the last one received (and written) will be the real last block of the plain file.

benrubson avatar May 05 '18 15:05 benrubson

The every block must have padding.

rfjakob avatar May 05 '18 15:05 rfjakob

A 15 bytes padding ? Or a OneAndZeroes padding of each block, with a cipherBlockSize - 1 bytes padding for the last block ?

benrubson avatar May 05 '18 15:05 benrubson

Yes, 15 bytes.

rfjakob avatar May 05 '18 15:05 rfjakob

At that moment, it's the last block, right?

rfjakob avatar May 05 '18 15:05 rfjakob

Look at these use cases :

Backup : plain local -> EncFS reverse -> rsync to remote location

Restore : rsync from remote location -> EncFS reverse -> plain local

I'm not sure backup will need to insert a 15 bytes padding after every block.

benrubson avatar May 05 '18 15:05 benrubson

Interesting use case, but there are other problems:

plain local -> EncFS reverse -> rsync to remote location -> ciphertext

Now, let's assume the ciphertext contains 1000. And rsync happens to write() a chunk of data that ends with 1000. What does

EncFS reverse -> plain local

do?

rfjakob avatar May 05 '18 16:05 rfjakob

// strange duplicate part of your message above deleted

Yes, I think this is the last tricky case. I already thought about this, and I think we need an additional internal buffer.

Let's take your example.

1000%16 = 8 We crop last last 8 bytes. We decode. We remove padding bytes if it looks like we can. We write plain data at the end of the plain file. We return that we wrote 1000 bytes. As 1000 < 4096, we keep the 1000 bytes into an internal buffer, as we may receive the next bytes of the block.

If we receive a write request with the next 1000 bytes, we will not read the 1000 previous bytes of the block from the plain file, as we have cropped some bytes, but will take them from our internal buffer.

benrubson avatar May 05 '18 16:05 benrubson

I was curious if that use case really works, so I did:

a/zero -> reverse -> b/eNZPWSyw0rxU7T37UwNN3,n9  ----> cp
d/zero -> reverse -> c/eNZPWSyw0rxU7T37UwNN3,n9  <---/

And it seems wo work at first glance:

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
2d56b031dc8683c233c016429084f870  d/zero

So that was easy, lets overwrite the middle of the file with itself:

dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 bs=123 seek=43 skip=43 count=1

Random garbage:

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
a22fc0525129c3eb2fe1af2e4bc9fd5d  d/zero

rfjakob avatar May 05 '18 16:05 rfjakob

However, this (note the odd block size):

dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 bs=123

works, and I'm not sure why.

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
2d56b031dc8683c233c016429084f870  d/zero

On decryption, we have to know if it is the last block, because the last block is handled differently. Where do we have this information from?

rfjakob avatar May 05 '18 16:05 rfjakob

I think every 123 bytes block is written using stream cipher (so this creates garbage), until you are ready to write enough bytes (up to blokSize) to read (stream-encode) them again and re-decode the whole block correctly using CBC.

Confirmed (here blockSize is 1024) :

VERBOSE FileNode::write offset 984, data size 123 [FileNode.cpp:247]
VERBOSE streamRead(data, 984, IV) [CipherFileIO.cpp:350]
VERBOSE Called blockWrite [CipherFileIO.cpp:420]
VERBOSE Called streamWrite [CipherFileIO.cpp:429]

benrubson avatar May 05 '18 16:05 benrubson

Strangely, in your failing example above, file get truncated by dd at the end of the 123 bytes written block (I reproduced it). There is a bug somewhere :)

benrubson avatar May 05 '18 17:05 benrubson

Oh, my bad! You are right, the truncation is what causes the garbage:

dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 \
 bs=123 seek=43 skip=43 count=1 conv=notrunc

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
2d56b031dc8683c233c016429084f870  d/zero

rfjakob avatar May 05 '18 17:05 rfjakob

What's the current state of this issue?

jcguu95 avatar May 19 '21 13:05 jcguu95