ub-canaries
ub-canaries copied to clipboard
*(volatile t *)&local = …; is argued by some to be unreliable, should be a canary
Here is a POC program:
#include <string.h>
#define L 20
void decrypt(unsigned char * plaintext, unsigned char * ciphertext) {
unsigned char w[L];
memcpy(w, ciphertext, L);
for (int i = 0; i < L; i++) w[i] = w[i] ^ 17;
for (int i = 1; i < L; i++) w[i] = w[i] + w[i-1];
memcpy(plaintext, w, L);
/* now for the cleanup: */
for (int i = 0; i < L; i++) ((volatile unsigned char *)w)[i] = 0x07;
}
For my version of Clang, I can detect that the cleanup was preserved thus:
~ $ clang -S -O -Wall t.c && cat t.s | grep \$7
movb $7, -32(%rbp)
movb $7, -31(%rbp)
movb $7, -30(%rbp)
movb $7, -29(%rbp)
movb $7, -28(%rbp)
movb $7, -27(%rbp)
movb $7, -26(%rbp)
movb $7, -25(%rbp)
movb $7, -24(%rbp)
movb $7, -23(%rbp)
movb $7, -22(%rbp)
movb $7, -21(%rbp)
movb $7, -20(%rbp)
movb $7, -19(%rbp)
movb $7, -18(%rbp)
movb $7, -17(%rbp)
movb $7, -16(%rbp)
movb $7, -15(%rbp)
movb $7, -14(%rbp)
movb $7, -13(%rbp)
~ $ clang -v
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix
And the same oracle works for an oldish version of GCC:
$ gcc -std=c99 -S -O -Wall t.c && cat t.s | grep \$7
movb $7, (%rax)
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.4-2ubuntu1~14.04' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
Just a note that although a recent GCC will not remove the 7s from this program, they will do so if the constant 20 is changed to 4 and you compile at -O3.
Beware that GCC Explorer hosts C++ compilers only (as of this writing).
I still get the $7
constants indicating that the cleanup code was not removed, even after changing L
to 4
, with:
- the same GCC 4.8.4 as above
- the same Clang 503.0.40 as above
- GCC 5.2.0
All used as C compilers. On the other hand I was only using -O
as commandline flag.
I can reproduce the lack of $7
from the generated code at -O3
with GCC 5 but that does not mean that the volatile
has been ignored. Indeed, with -O3
the computation takes place entirely in registers. All the stack slots that have contained the plaintext are successfully cleared because there are no such stack slots:
_decrypt:
LFB0:
pushq %rbx
LCFI0:
movl (%rsi), %ebx
xorb $17, %bl
movl %ebx, %eax
movl %ebx, %edx
shrl $16, %eax
shrl $24, %edx
movl %eax, %ecx
movzbl %bh, %eax
xorl $17, %edx
xorl $17, %eax
xorl $17, %ecx
addl %ebx, %eax
movb %al, %bh
addl %ecx, %eax
movzbl %al, %ecx
movzwl %bx, %ebx
addl %eax, %edx
sall $16, %ecx
sall $24, %edx
orl %ecx, %ebx
movl %ebx, %eax
popq %rbx
LCFI1:
orl %edx, %eax
movl %eax, (%rdi)
ret
This example, with the constant 4
, is playing with fire in that GCC is able to move the entire array w
to registers despite the address of w
being converted to volatile unsigned char *
but it's not an example of GCC doing the wrong thing, per se.
I guess the oracle can be enhanced to accept when the function does not access any offset of esp
or ebp
at all.
You should mention this registerness to Sam Neves
@sneves Hello,
your example http://goo.gl/LDfPHG was so short as to be confusing me, but my thesis is that the difference between x[7]
and x[4]
is that GCC maps the four elements of x[4]
to registers. In this eventuality, it finds itself without an address to access at the time of doing p[i]=0;
after having inlined memset_s
inside f1
. The compiler cannot be said to have generated code that left secrets on the stack. Instead the code does leave secrets in registers. I thought this was less bad than leaving data on the stack, but the secrets can remain in these registers for an arbitrarily long time, and still be there if/when an attacker forces arbitrary code to be executed because of a buffer overflow elsewhere.
Overall, I have to concede that when converting the address of a non-volatile object to pointer to volatile, GCC already does something other than what the programmer intended now in some cases.
But leaving secrets in registers is orthogonal to any use of volatile. I think.
Variables in registers makes sense, yeah. Unclear to me what volatile access to those would mean. It's odd, though, that the effect does not happen with x[3]
, which is also small enough to fit a 32-bit register, nor with x[8]
, which matches the architecture's register width perfectly.
There's clearly a lot going on here, perhaps including one or more compiler bugs.
A volatile variable clearly cannot be placed in a register. Also, any address-taken variable where we cannot see all uses of the address cannot be register-allocated either. It seems only a small step to also conclude that if we can see all uses, but at least one is cast to pointer-to-volatile, this should also suppress register allocation.
A lot of real code depends on pointer-to-volatile acting like the pointed-to object is volatile. This includes things like Linux:
https://lwn.net/Articles/508991/
The standard is irrelevant here.
@regehr The data has auto
storage. The compiler chose to allocate it in non-volatile storage. The compiler also knows (in theory) that there is no way for another thread to modify it in any way that is supported in the C language. Therefore, there is no reason to treat it as volatile when it is known to be non-volatile.
There's no way to convey ACCESS_ONCE
in C. Interestingly, I found another case where ACCESS_ONCE
is needed, in a non-operating-system context: https://boringssl.googlesource.com/boringssl/+/master/crypto/cipher/tls_cbc.c#189. HTH.