lisa icon indicating copy to clipboard operation
lisa copied to clipboard

[FEATURE REQUEST] Pointer arithmetic

Open lucaneg opened this issue 2 years ago • 0 comments

Memory model

To add support for pointers, we first have to reshape the memory model. Instead of having only the heap, the memory is now partitioned in heap and stack.

HeapStack
h1h2h3h4s1s2s3s4
h5h6h7h8s5s6s7s8

Identifiers

To reflect this change, the hierarchy of the Identifier symbolic expression has to change:

  • Identifier
    • Variable: a real variable of the program
    • MemoryLocation: a generic location in the program memory (hn or sn)
      • StackElement: a location on the stack (sn)
      • HeapLocation: a location in the heap (hn)
    • MaskedIdentifier: a compound identifier referencing another one
      • Address: the address of another identifier
      • OutOfScopeIdentifier: an identifier that has been scoped by a call or similar construct, that the code cannot directly access

Memory Expressions

The HeapExpressions must now accommodate also operations on other parts of the memory:

  • MemoryExpression
    • Allocation: an expression that allocates a new memory region. Corresponds to new/malloc/.... Always rewrites to a MemoryLocation
      • HeapAllocation: an expression that allocates a new heap region. Always rewrites to a HeapLocation
      • StackAllocation: an expression that allocates a new stack region. Always rewrites to a StackElement
    • GetAddress(e): gets the address of an expression e. Corresponds to &e. Always rewrites to an Address. e can be an arbitrary expression.
    • Dereference(e): dereferences an address to get to the memory location. Corresponds to *e. e must be an Address. Always rewrites to an Identifier
    • Traverse(r, e): gets from one location to another one. Corresponds to r.e/r[e]/.... Always rewrites to a MemoryLocation. r must be an Address. e can be a string, a Global, or an arbitrary expression.

Having a generic expression in Traverse and GetAddress enables usage of pointer arithmetics and tracking of fields in dynamic objects. A MemoryDomain can use its SemanticOracle to evaluate the expression, obtaining an instance of Lattice that represents the location accessed.

Memory Domains

The current memory abstractions are Monolithic, ProgramPointBased, and FieldSensitiveProgramPointBased. Monolithic is fine as-is, since everything is the same locations. PogramPointBased does not distinguish between different sub-locations, so it does not require modifications for Traverse. Instead, for GetAddress the evaluation can be reduced to the evaluation of their root location (e.g., &(x+1) is the same of &x). FieldSensitiveProgramPointBased can instead track sub-locations, but at the current state can only track syntactic ones: x.f works if f is a field, but x[n] does not work if n is a variable. This can still be acceptable on some programs that only manipulate objects with statically known fields. While the obvious endgame is to implement a shape analysis, a middle-ground implementation can be this one, that is basically a field sensitive implementation of ProgramPointBased that uses lattice instances for fields.

Still to discuss

  • aliasing vs shallow copy: how do we distinguish?
  • by-ref vs by-val parameter passing

lucaneg avatar Sep 27 '21 17:09 lucaneg