ub-canaries icon indicating copy to clipboard operation
ub-canaries copied to clipboard

*(volatile t *)&local = …; is argued by some to be unreliable, should be a canary

Open pascal-cuoq opened this issue 9 years ago • 9 comments

Here is a POC program:

#include <string.h>

#define L 20

void decrypt(unsigned char * plaintext, unsigned char * ciphertext) {
  unsigned char w[L];
  memcpy(w, ciphertext, L);
  for (int i = 0; i < L; i++) w[i] = w[i] ^ 17;
  for (int i = 1; i < L; i++) w[i] = w[i] + w[i-1];
  memcpy(plaintext, w, L);

  /* now for the cleanup: */
  for (int i = 0; i < L; i++) ((volatile unsigned char *)w)[i] = 0x07;
}

For my version of Clang, I can detect that the cleanup was preserved thus:

~ $ clang -S -O  -Wall t.c && cat t.s | grep \$7
    movb    $7, -32(%rbp)
    movb    $7, -31(%rbp)
    movb    $7, -30(%rbp)
    movb    $7, -29(%rbp)
    movb    $7, -28(%rbp)
    movb    $7, -27(%rbp)
    movb    $7, -26(%rbp)
    movb    $7, -25(%rbp)
    movb    $7, -24(%rbp)
    movb    $7, -23(%rbp)
    movb    $7, -22(%rbp)
    movb    $7, -21(%rbp)
    movb    $7, -20(%rbp)
    movb    $7, -19(%rbp)
    movb    $7, -18(%rbp)
    movb    $7, -17(%rbp)
    movb    $7, -16(%rbp)
    movb    $7, -15(%rbp)
    movb    $7, -14(%rbp)
    movb    $7, -13(%rbp)
~ $ clang -v
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix

And the same oracle works for an oldish version of GCC:

$ gcc -std=c99 -S -O  -Wall t.c && cat t.s | grep \$7
    movb    $7, (%rax)
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.4-2ubuntu1~14.04' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) 

pascal-cuoq avatar Jan 26 '16 15:01 pascal-cuoq

Just a note that although a recent GCC will not remove the 7s from this program, they will do so if the constant 20 is changed to 4 and you compile at -O3.

regehr avatar Jan 26 '16 16:01 regehr

Beware that GCC Explorer hosts C++ compilers only (as of this writing).

I still get the $7 constants indicating that the cleanup code was not removed, even after changing L to 4, with:

  • the same GCC 4.8.4 as above
  • the same Clang 503.0.40 as above
  • GCC 5.2.0

All used as C compilers. On the other hand I was only using -O as commandline flag.

pascal-cuoq avatar Jan 26 '16 16:01 pascal-cuoq

I can reproduce the lack of $7 from the generated code at -O3 with GCC 5 but that does not mean that the volatile has been ignored. Indeed, with -O3 the computation takes place entirely in registers. All the stack slots that have contained the plaintext are successfully cleared because there are no such stack slots:

_decrypt:
LFB0:
    pushq   %rbx
LCFI0:
    movl    (%rsi), %ebx
    xorb    $17, %bl
    movl    %ebx, %eax
    movl    %ebx, %edx
    shrl    $16, %eax
    shrl    $24, %edx
    movl    %eax, %ecx
    movzbl  %bh, %eax
    xorl    $17, %edx
    xorl    $17, %eax
    xorl    $17, %ecx
    addl    %ebx, %eax
    movb    %al, %bh
    addl    %ecx, %eax
    movzbl  %al, %ecx
    movzwl  %bx, %ebx
    addl    %eax, %edx
    sall    $16, %ecx
    sall    $24, %edx
    orl %ecx, %ebx
    movl    %ebx, %eax
    popq    %rbx
LCFI1:
    orl %edx, %eax
    movl    %eax, (%rdi)
    ret

This example, with the constant 4, is playing with fire in that GCC is able to move the entire array w to registers despite the address of w being converted to volatile unsigned char * but it's not an example of GCC doing the wrong thing, per se.

I guess the oracle can be enhanced to accept when the function does not access any offset of esp or ebp at all.

pascal-cuoq avatar Jan 26 '16 18:01 pascal-cuoq

You should mention this registerness to Sam Neves

regehr avatar Jan 26 '16 18:01 regehr

@sneves Hello,

your example http://goo.gl/LDfPHG was so short as to be confusing me, but my thesis is that the difference between x[7] and x[4] is that GCC maps the four elements of x[4] to registers. In this eventuality, it finds itself without an address to access at the time of doing p[i]=0; after having inlined memset_s inside f1. The compiler cannot be said to have generated code that left secrets on the stack. Instead the code does leave secrets in registers. I thought this was less bad than leaving data on the stack, but the secrets can remain in these registers for an arbitrarily long time, and still be there if/when an attacker forces arbitrary code to be executed because of a buffer overflow elsewhere.

Overall, I have to concede that when converting the address of a non-volatile object to pointer to volatile, GCC already does something other than what the programmer intended now in some cases.

pascal-cuoq avatar Jan 26 '16 19:01 pascal-cuoq

But leaving secrets in registers is orthogonal to any use of volatile. I think.

regehr avatar Jan 26 '16 19:01 regehr

Variables in registers makes sense, yeah. Unclear to me what volatile access to those would mean. It's odd, though, that the effect does not happen with x[3], which is also small enough to fit a 32-bit register, nor with x[8], which matches the architecture's register width perfectly.

sneves avatar Jan 26 '16 20:01 sneves

There's clearly a lot going on here, perhaps including one or more compiler bugs.

A volatile variable clearly cannot be placed in a register. Also, any address-taken variable where we cannot see all uses of the address cannot be register-allocated either. It seems only a small step to also conclude that if we can see all uses, but at least one is cast to pointer-to-volatile, this should also suppress register allocation.

A lot of real code depends on pointer-to-volatile acting like the pointed-to object is volatile. This includes things like Linux:

https://lwn.net/Articles/508991/

The standard is irrelevant here.

regehr avatar Jan 26 '16 20:01 regehr

@regehr The data has auto storage. The compiler chose to allocate it in non-volatile storage. The compiler also knows (in theory) that there is no way for another thread to modify it in any way that is supported in the C language. Therefore, there is no reason to treat it as volatile when it is known to be non-volatile.

There's no way to convey ACCESS_ONCE in C. Interestingly, I found another case where ACCESS_ONCE is needed, in a non-operating-system context: https://boringssl.googlesource.com/boringssl/+/master/crypto/cipher/tls_cbc.c#189. HTH.

briansmith avatar Jan 27 '16 08:01 briansmith