deepstate icon indicating copy to clipboard operation
deepstate copied to clipboard

Investigate common usage patterns of strcmp, strlen, memcmp, and memcpy

Open pgoodman opened this issue 7 years ago • 0 comments

Things to try.

String comparison

Make a symbolic-execution aware strcmp(a, b), but on the library side. What we're looking for is: is one or or both of a[0] and b[0] symbolic. In the case that one is, then I think what we want to try is produce two forks: one where the return value is 0 and we constrain the bytes of a to be the bytes of b. We probably need solver support here to ask questions like "can this range of memory be the same." The other fork will return a non-zero number in the range [CHAR_MIN, CHAR_MAX]. We can do this via the following pattern:

if (DeepState_Bool()) {
  int ret = DeepState_IntInRange(CHAR_MIN, CHAR_MAX);
  DeepState_Assume(ret != 0);
  // Assume they aren't different, but don't actually constraint the bytes
  // We want to experiment here to see if it's reasonable that program
  // behaviour, actually does the right thing. Perhaps we could also
  // provide a deferred assertion mechanism, e.g. `DeepState_AssumeLater`.
  return ret;
} else {
  // constrain the bytes of one of the sides to be equal to the other
  // then and provide a an interface to overwrite them in the backing
  // memory store, e.g. `DeepState_WriteByte(addr, 0)` to write `0` to
  // `addr`, ignoring any possible errors, and succeeding even if the
  // memory is read-only. 
  return 0;
}

Recall

It is potentially worth it to implement strcmp to have a record/recall mechanism, where we tell the runtime engine about each constant string, e.g. DeepState_RecordString(b) if b was non-symbolic above, then augment symbolic strcmps that forks by comparing against all prior recorded strings.

String lengths

Implement strlen(a) in terms of whether or not a[0] is symbolic. If it is, then provide an interface to get the minimum string length and the maximum string length, and then do a fork where we constrain the lengths, and use the DeepState_WriteByte interface to force in a NUL-terminator.

Lexers

Investigate the code patterns of lexers produced by lex/flex/ragel or yacc/bison/antlr, and perhaps also look into the lexers using by javascript or HTML parsing parsing engines. It's one thing to construct symbolic tokens and benefit from strcmp-like optimizations described above, but it's another to materialize tokens out of thing air based on one's progress in a state machine.

Perhaps one benefit of using C/C++ is that we can do things like: __attribute__((weak)) yylex, and use to give us possible access into the inner workings of parsers against which the program is linked.

pgoodman avatar Dec 08 '17 00:12 pgoodman