s2e-env
s2e-env copied to clipboard
Replace LLVM JIT with TCI (Tiny Code Interpreter)
LLVM is very slow to generate and interpret. Consider the following loop: for (int i=0; i < 10000000; ++i) {a++} If a is symbolic, the loop will be translated to LLVM and interpreted in KLEE. This will take hours (if not days) to run. Unfortunately, this pattern is quite common (e.g., initing an array) and doesn't even require direct dependence on a symbolic value. If some cpu register happens to be symbolic, even if that register is not used by the loop, everything will run in LLVM.
Solution: replace LLVM with TCI. TCI is a backend available in QEMU for hosts that don't support one of the native backends. We need to extend it to support symbolic data.
Note that LLVM will still be kept because we need to interpret QEMU's helpers, which are compiled from C code to LLVM (unless we write a TCI backend for clang...).
I did a test in klee, it seems not that slow, is it possible something else caused the slowdown?
KLEE: done: total instructions = 100000025 KLEE: done: completed paths = 2 KLEE: done: generated tests = 2
real 0m36.183s user 0m36.132s sys 0m0.034s
compile with -O0, consider the assembly -> llvmIR bloats x 20, it should not be so slow...
#include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #ifndef NO_KLEE #include <klee/klee.h> #endif
int main(int argc, char* argv[]) { int a = 0; #ifndef NO_KLEE klee_make_symbolic(&a, sizeof(a), "__ktr_a"); #endif for (int i = 0; i < 10000000; ++i) { a ++; }
if (a == 10000000) { printf("reached case1\n"); } else { printf("reached case2\n"); } return 0; }
You ran that in native KLEE, and 36 seconds is still very slow I would argue. I suggest you try the same in S2E... It's way worse, because that single instruction (a++) is blown up to 1 or 2 orders of magnitude larger. The reason is that vanilla KLEE operates on clean LLVM bitcode, whereas S2E has to insert CPU emulation code there.
I remember trying to boot MSDOS by running everything in KLEE, took 1 hour instead of 3 seconds (1000x slowdown).
Yes, you're right, klee is about 2000x slow, tcg is about 20x, I'm not aware of how tci perfoms, but seems one or two order of magnitude is left for improvements.
We built a TCI prototype back in 2012, it was about 2-3x slower than native TCG and way faster than KLEE. Unfortunately we never got the time to clean it up and integrate. I hope at some point to at least dump the code if not porting to s2e 2.0.
Btw, we'd still have to use KLEE/LLVM to interpret the QEMU helpers (op_helper.c) unless we write a TCI backend for clang. Luckly, these helpers are not called that often.
Hope to see the code, maybe it's possible to use QEMU's dbt facility to transform the helpers to tcg IR dynamicly just as guest code, then interpret using tci, but definitely this would be rather tricky...
The existing SE engines are slow enough to make them usable in many projects, I'm a bit interested in this area, currently is there any plan to release the TCI prototype? @vitalych
@humeafo I am glad you asked :) I will start working on it probably in July. Before that, I need to do some ground work to make its implementation easier:
- Refactor the engine (100% done)
- Upgrade libtcg/libcpu to the latest code from upstream QEMU (S2E currently uses a version from 2012) [libtcg upgrade 90% done]
- Switch libtcg/libcpu to C++, have proper encapsulation, etc. Currently it's hard to add an additional backend that can run side-by-side with the existing one because of lots of global variables and various #ifdefs everywhere.