smack icon indicating copy to clipboard operation
smack copied to clipboard

Implement context-sensitive memory model

Open shaobo-he opened this issue 4 years ago • 3 comments

Motivation

Small functions that modify/access pointers such as strlen, strcmp, and xmalloc cause DSA to merge nodes unnecessarily. For example,

#include <string.h>

char* x = "hello";
char* y = "world";

__attribute__((always_inline))
int mystrlen(char* p) {
  int z = 0;
  while(*p++)
    z += 1;
  return z;
}
int main(void) {
  return strlen(x) + strlen(y);
  //return mystrlen(x) + mystrlen(y);
}

SMACK only generates one region for both x and y whereas if we choose to use mystrlen, SMACK produces two regions. If we adopt a context-sensitive version of sea-dsa, we will get a region for each call site of strlen.

shaobo-he avatar Sep 01 '21 00:09 shaobo-he

Some background information about sea-dsa:

sea-dsa's context-sensitive analysis produces a graph for each function, where each pointer of the function is associated with a cell. Take one of sea-dsa's regression tests as an example,

extern int nd(void);

void f ( int *x , int *y ) {
  *x = 1 ; *y = 2 ;
}

void g (int *p , int *q , int *r , int *s) {
  f (p,q) ;
  f (r,s);
}

int main (int argc, char**argv){

  int x;
  int y;
  int w;
  int z;

  int *a = &x;
  int *b = &y;
  if (nd ()) a = b;
  
  g(a, b, &w, &z);
  return x + y + w + z;
}

Compiling it with O1 and invoking the context-sensitive version of sea-dsa produces the following three graphs.

image

image

image

So a rudimentary attempt to add the context-sensitive memory model is to encode the functions in store-passing style. Namely, each Boogie procedure takes input as the regions which its pointers are associated with and returns the updated regions.

For example, for f, g, and main, their signatures are,

procedure f(x: ref, y: ref, M.in.1: ref[int]) returns (M.out.1: ref[int]);
procedure g(p, q, r, s, M.in.1: ref[int], M.in.2:ref[int]) returns (M.out.1: ref[int], M.out.2: ref[int])
procedure main(M.in.1: ref[int], M.in.2:ref[int]) returns (M.out.1: ref[int], M.out.2: ref[int])

For each procedure, we assign the in version of a region into its out version because parameters are immutable in Boogie. So, for example, we do the following in g:

entry:
M.out.1 := M.in.1;
M.out.2 := M.in.2;

In this way, we always use the out version of a region when encoding load/store instructions.

shaobo-he avatar Sep 08 '21 22:09 shaobo-he