smack icon indicating copy to clipboard operation
smack copied to clipboard

Cannot model buffer overflow inside a C struct

Open bitcalc opened this issue 6 years ago • 2 comments

For the following source code:

#include <stdlib.h>
#include "smack.h"

typedef struct {
  int a[10];
  char b;
  int c;
  double d[10];
} mixed;

int main(void) {
  mixed* s = malloc(sizeof(mixed));
  s->a[10] = 0;
  free(s);
  return 0;
}

SMACK fails to report a violation trace. I looked into the LLVM bitcode and Boogie IVL (I know little about LLVM, but am new to Boogie), and found out how SMACK handles it. The generated original LLVM code is as follows (obtained by smack ... -bc mixed.bc):

  %1 = alloca i32, align 4
  %2 = alloca %struct.mixed*, align 8
  store i32 0, i32* %1, align 4
  ...
  %3 = call noalias i8* @malloc(i64 128) #3, !dbg !27
  %4 = bitcast i8* %3 to %struct.mixed*, !dbg !27
  store %struct.mixed* %4, %struct.mixed** %2, align 8, !dbg !26
  %5 = load %struct.mixed*, %struct.mixed** %2, align 8, !dbg !28
  %6 = getelementptr inbounds %struct.mixed, %struct.mixed* %5, i32 0, i32 0
  %7 = getelementptr inbounds [10 x i32], [10 x i32]* %6, i64 0, i64 10 <-- second GEP
  store i32 0, i32* %7, align 8, !dbg !30

But the final LLVM code is (obtained by smack ... -ll mixed.lll):

  %1 = call noalias i8* @malloc(i64 128) #6, !dbg !26, !verifier.code !27
  %2 = bitcast i8* %1 to %struct.mixed*, !dbg !26, !verifier.code !27
  ...
  %3 = getelementptr inbounds %struct.mixed, %struct.mixed* %2, i32 0, i32 0
  %4 = getelementptr inbounds %struct.mixed, %struct.mixed* %2, i32 0, i32 0, i64 10
  %5 = bitcast i32* %4 to i8*
  %6 = bitcast i8* inttoptr (i64 4 to i8*) to i8*
  call void @__SMACK_check_memory_safety(i8* %5, i8* %6)
  store i32 0, i32* %4, align 8, !dbg !45, !verifier.code !27

The related Boogie code is:

  call $p0 := malloc(128);
  ...
  $p1 := $bitcast.ref.ref($p0);
  ...
  $p2 := $p1;
  $p3 := $add.ref($p1, 40);
  $p4 := $bitcast.ref.ref($p3);
  $p5 := $bitcast.ref.ref($i2p.i64.ref(4));
  call __SMACK_check_memory_safety($p4, $p5);
  ...
  $M.0 := $store.i32($M.0, $p3, 0);

There are two issues.

  1. The type parameter of the second getelementptr instruction in the original LLVM bitcode is [10 x i32], but it should be %struct.mixed according to the reference which says: The first argument is always a type used as the basis for the calculations.

  2. SMACK lacks modeling of the internal structure of a C struct. The code shows that the C struct mixed is modeled as a single allocated heap object in Boogie, with the internal structure ignored. The memory check function __SMACK_check_memory_safety only checks if the pointer access is inside the single heap object.

bitcalc avatar Dec 10 '18 19:12 bitcalc

Since I'm new to Boogie, I'd like to venture out to give my thoughts on how to handle the internal structure of a C struct for the exercise purpose.

For each struct, there is a Boogie function that checks if an internal buffer access is out of bound. Mirroring the LLVM getelementptr function, for each pointer calculation, the function is called with the field number and the upper-bound offset accessed. For example, for the above case, I will have:

  // %struct.mixed = type { [10 x i32], i8, i32, [10 x double] }

  %3 = getelementptr inbounds %struct.mixed, %struct.mixed* %2, i32 0, i32 0
  __SMACK_check_struct.mixed_element_address(0, 0);
  %4 = getelementptr inbounds [10 x i32], [10 x i32]* %2, i32 0, i32 0, i64 10
  __SMACK_check_struct.mixed_element_address(0, 44);

procedure __SMACK_check_struct.mixed_element_address(field: int, offset: ref)
{
  assert {:valid_pointer}
    (field == 0 && $sge.ref.bool(offset, $0.ref) && $slt.ref.bool(offset, 40)) ||
    (field == 1 && $sge.ref.bool(offset, 40) && $slt.ref.bool(offset, 41)) ||
    (field == 2 && $sge.ref.bool(offset, 44) && $slt.ref.bool(offset, 48)) ||
    (field == 3 && $sge.ref.bool(offset, 48) && $slt.ref.bool(offset, 128))
}

The type and its function are determined by the indexing of arguments of getelementptr. However, I'm not sure if this is the best way to do it. I'll appreciate if you could comment on it and share your thoughts.

bitcalc avatar Dec 10 '18 19:12 bitcalc

I happened to communicate with @zvonimir, and this is what we've said about the second issue.

@zvonimir Your solution seems to be a step in the right direction. You would have to parse types of all used structures and emit the appropriate Boogie procedures. That should be doable.

Note, however, that you are placing assertions/checks on GEP instructions, instead of where memory actually gets accessed. (GEP does not access memory but just performs pointer arithmetic.) So one could write a piece of code that simply computes an address of a memory location that is out-of-bounds, but this address is never accessed. You would report an error in that case, which is conservative, but it would be a false error since the memory location is never accessed. This might be an acceptable trade-off if it wouldn't happen often in practice.

I agree with you. Besides the case you mentioned above, there is another case where a false error may arise: the program later uses the computed address to subtract some offset to make it legal. So a complete solution involves checking if an address is actually used with its actual value before inserting the memory check function. But in practice, the benefit of achieving such perfection is hard to say useful because, for safe programming practices, we can argue the following:

  1. If an address is computed but never used, then it's a quality bug issue of useless computation.

  2. If an address is intentionally computed out-of-bound and then later restored to inbound, the out-of-bound computation is redundant.

Therefore, the conservative approach helps improving code quality as well. The solution can be enhanced to add warnings for these two cases instead of reporting a buffer overflow.

bitcalc avatar Dec 10 '18 19:12 bitcalc