symcc icon indicating copy to clipboard operation
symcc copied to clipboard

symcc fails to solve simple equality constraints

Open yiyunliu opened this issue 3 years ago • 1 comments

The following program reads a long int, assigns the long int to an unsigned long, then branches on whether the unsigned long is equal to the constant 10086:

#include <unistd.h>

int main() {
  long ival;
  unsigned long abs_ival;
  read(STDIN_FILENO, &ival, sizeof ival);

  abs_ival = (unsigned long)ival;
  
  if(abs_ival == 10086) {
    return 9;
  }

  return 10;
}

For this program, symcc is able to generate an input that's equal to 10086 and causes the program to return 9 instead of 10.

On the other hand, if we add in a conditional branch on the sign of ival, symcc fails to generate a byte sequence that corresponds to the constant 10086.

#include <unistd.h>
int main() {
  long ival;
  unsigned long abs_ival;
  read(STDIN_FILENO, &ival, sizeof ival);

  if (ival < 0) {
    abs_ival = 0U - (unsigned long)ival;
  }
  else {
    abs_ival = (unsigned long)ival;
  }
  
  if(abs_ival == 10086) {
    return 9;
  }

  return 10;
}

I'm aware that symcc uses optimistic solving under the hood, but this program is simple and the pure_concolic_execution.sh helper script is stuck at a fixed point of 2 inputs, neither of which corresponds to the constant 10086.

We found this issue when trying to symbolically execute python with symcc. The program above is a fragment of Python's long -> big int conversion function.

yiyunliu avatar Sep 13 '21 23:09 yiyunliu

After investigating this issue further, I found that at -O0 level, the second program actually works.

I looked into the llvm IR of the second program at different optimization levels. At -O0, there is a snippet that compares abs_ival to the constant:

  %44 = call i8* @_sym_build_integer(i64 10086, i8 64)
  %45 = call i8* @_sym_build_equal(i8* %40, i8* %44)

At -O3, I don't see a similar snippet that performs the comparison. That's probably why I don't see the solution of abs_ival == 10086 being generated by symcc.

I recall that according to the paper, symcc inserts its calls to the runtime at an earlier stage of the compilation pipeline so optimization levels shouldn't have a large impact on its behavior. Has that changed in the current implementation?

Here are links to the full LLVM code: -O0: https://gist.github.com/yiyunliu/ae48fbfb0b7fdfc5977ccd7699c07dd0 -O3: https://gist.github.com/yiyunliu/a7e57838150a8f6ca99eff5e98731e2f

yiyunliu avatar Sep 20 '21 23:09 yiyunliu