C++ Polymorhpism Not Working As Expected
Background
Consider the following test case:
#include <smack.h>
// @expect verified
class Polygon {
protected:
int width, height;
public:
Polygon( int _width, int _height ): width(_width), height(_height) {
}
virtual int getArea() =0;
};
class Rectangle: public Polygon {
public:
/**
* Constructs a new Rectangle instance with the supplied width and height.
*
* @param _width - The width of the Rectangle.
* @param _height - The height of the Rectangle.
*/
Rectangle( int _width, int _height ): Polygon( _width, _height ) {
}
/**
* Returns the calculated area of this Rectangle instance.
*/
int getArea() {
return width * height;
}
};
int main() {
Polygon *p1;
Rectangle r(4, 5);
assert(r.getArea() == 20);
p1 = &r;
assert(p1->getArea() == 20);
}
When invoking smack.py on this, the first assertion succeeds and the second assertion fails. Since pointer "p1" points to the Rectangle instance, the second assertion should succeed as well.
Further Analysis
When executing the test without assertions and printing out the area using:
std::cout << p1->getArea() << std::endl;
I get the correct return value. So it appears that it is a translation issue between LLVM IR and Boogie. This assumption has also been backed up by comparing the relative addresses of the getArea member functions for the Polygon pointer and Rectangle instance. The assertion:
int (Polygon::*pmfp)() = &Polygon::getArea;
int (Rectangle::*pmfr)() = &Rectangle::getArea;
assert(pmfp == pmfr);
succeeds.
The problem is that we are not yet computing possible targets for virtual method calls — those calls remain untranslated:
// WARNING: unsoundly ignoring indeterminate call: %call1 = call i32 %2(%class.Polygon* %0), !dbg !82
So far we only compute possible targets over functions whose address is taken in the program, which seems to work for C programs — see here: https://github.com/smackers/smack/blob/develop/lib/smack/SmackInstGenerator.cpp#L558-L560
For C++ programs however, we would have to take into account the class hierarchy in order to compute the target — e.g., to know whether Polygon and Rectangle define the required method, and to know which should be called in this case. In general this probably needs to be done dynamically, meaning that the translated program must use extra state to determine the (dynamic) type of object pointed to at any given point in time.
Interesting.. Thanks for the information. Would you have any resources that I could look up on how to do this with LLVM IR? I'm quite new to LLVM in general, so anything can help!
I'd like to extend SMACK to support this feature.
You would need to come up with a Boogie encoding scheme to replace the usual dynamic vtable lookup, e.g., as you can see happening in the Clang-generated LLVM code of your initial example:
define i32 @main() #5 {
entry:
%p1 = alloca %class.Polygon*, align 8
%r = alloca %class.Rectangle, align 8
call void @llvm.dbg.declare(metadata !{%class.Polygon** %p1}, metadata !493), !dbg !495
call void @llvm.dbg.declare(metadata !{%class.Rectangle* %r}, metadata !496), !dbg !497
call void @_ZN9RectangleC1Eii(%class.Rectangle* %r, i32 4, i32 5), !dbg !497
%call = call i32 @_ZN9Rectangle7getAreaEv(%class.Rectangle* %r), !dbg !498
%cmp = icmp eq i32 %call, 20, !dbg !498
%conv = zext i1 %cmp to i32, !dbg !498
call void @__VERIFIER_assert(i32 %conv), !dbg !498
%0 = bitcast %class.Rectangle* %r to %class.Polygon*, !dbg !499
store %class.Polygon* %0, %class.Polygon** %p1, align 8, !dbg !499
%1 = load %class.Polygon** %p1, align 8, !dbg !500
%2 = bitcast %class.Polygon* %1 to i32 (%class.Polygon*)***, !dbg !500
%vtable = load i32 (%class.Polygon*)*** %2, !dbg !500
%vfn = getelementptr inbounds i32 (%class.Polygon*)** %vtable, i64 0, !dbg !500
%3 = load i32 (%class.Polygon*)** %vfn, !dbg !500
%call1 = call i32 %3(%class.Polygon* %1), !dbg !500
%cmp2 = icmp eq i32 %call1, 20, !dbg !500
%conv3 = zext i1 %cmp2 to i32, !dbg !500
call void @__VERIFIER_assert(i32 %conv3), !dbg !500
ret i32 0, !dbg !501
}
Currently I do not know whether there are analyses that we may leverage for computing possible call targets.
Upon further inspection of the output Boogie file, it appears that we are computing possible target calls properly. However, the assertions on these conditions are not being made properly. For example, consider the Boogie procedure that defines the main from above.
procedure {:entrypoint} main()
returns ($r: i32)
{
.
.
.
call $initialize();
call $p0 := $alloc($mul.ref(16, 1)); // allocate the Polygon pointer (*p1)
// WARNING: ignoring llvm.debug call.
assume true;
call _ZN9RectangleC2Eii($p0, 4, 5); // Invoke the Rectangle constructor (w=4, h=5)15} true;
call $i1 := _ZN9Rectangle7getAreaEv($p0); // Invoke the Rectangle's getArea function
$i2 := $eq.i32($i1, 20); // Compare the result returned by getArea to 20 (equality)
$i3 := $zext.i1.i32($i2);
call __VERIFIER_assert($i3); // Verify the equality check above with an assertion
$p4 := $bitcast.ref.ref($p0); // Bit cast the Rectangle reference to a Polygon pointer
call {:cexpr "p1"} boogie_si_record_ref($p4);
$p5 := $bitcast.ref.ref($p4);
$p6 := $load.ref($M.0, $p5);
$p7 := $p6;
$p8 := $load.ref($M.1, $p7);
// WARNING: unsoundly ignoring indeterminate call: %9 = call i32 %8(%class.Polygon* %4), !dbg !82
assume true;
assert $p8 == _ZN9Rectangle7getAreaEv; // **Assert that the reference is the Rectangle's *getArea* **
$i10 := $eq.i32($i9, 20);
$i11 := $zext.i1.i32($i10);
//call __VERIFIER_assert($i11); // ** Disable the old assertion that was failing **
$r := 0;
$exn := false;
return;
}
As you can see above, I disabled the call to __VERIFIER_assert($i11), which is the call that is failing when SMACK is run on the example, and I added the assertion $p8 == _ZN9Rectangle7getAreaEv. This added assertion checks if the reference in $p8 is the same as the constant reference to the global _ZN9Rectangle7getAreaEv. If you run Corral with this procedure modified, then all assertions pass.