SVF icon indicating copy to clipboard operation
SVF copied to clipboard

Flow sensitive wpa misses alias of global pointer?

Open Qcloud1223 opened this issue 1 year ago • 7 comments

Hi,

I'm having unexpected results when analyzing a complicated program, and the problem boils down to a seemingly simple issue. Consider this very simple program:

int *a;

int main()
{
    int b = 0;
    a = &b;

    return 0;
}

And the (related part of) PAG is also straightforward: image

Andersen's analysis shows *a is an alias of b:

$ wpa -stat=false -ander -print-aliases global-ptr.bc
...
MayAlias var5[a (base object)@] -- var12[@main]

while flow sensitive says no:

$ wpa -stat=false -fspta -print-aliases global-ptr.bc
...
NoAlias var5[a (base object)@] -- var12[@main]

I'm having a hard time understanding this. Though we are doing whole program analysis, a = &b will eventually be executed, so flow sensitive analysis should not overlook it. I believe such behavior can only happen when feeding context prior to a = &b, to DDA. Is this indeed unexpected behavior or I'm just mixing concepts up?

Plus, is there a simple way to find all aliases of a value? Currently SVF has nice and clean interface to get pts and revPts of a value, but the only interface I find for alias checking is alias(node1, node2), which is used to traverse all PAG nodes to find all aliases.

Qcloud1223 avatar Apr 29 '24 06:04 Qcloud1223

It works for me for both analyses (-fspta and -ander). You could try the below code:

extern void MAYALIAS(void*,void*);
int *a;

int main()
{
    int b = 0;
    a = &b;

    MAYALIAS(a,&b);
    return 0;
}

clang -S -c -emit-llvm ex.c -o ex.ll wpa -fspta ex.ll

[FlowSensitive] Checking MAYALIAS
	 SUCCESS :MAYALIAS check <id:18, id:12> at ()

yuleisui avatar Apr 29 '24 06:04 yuleisui

Thanks for your reply! I'm able to reproduce this, and the resulting PAG is here:

image

WPA says:

[FlowSensitive] Checking _Z8MAYALIASPvS_
         SUCCESS :_Z8MAYALIASPvS_ check <id:19, id:20> at ()

I can see that node 19 and node 20 is created as alias of a and &b, and SVF says they are aliases.

I'm wondering why SVF needs this to work. Also, can I analyze this program without modifying its source code?

Qcloud1223 avatar Apr 29 '24 06:04 Qcloud1223

Here is another finding: using -ander will make pts{5} = {13}, even node 5 is not a ValVar. Using -fspta gives a empty pts for node 5.

FYI, I'm interested in which object a points to, and I come up with 2 possible way:

  1. Check the PTS of node 5. But FlowSensitive generates an empty PTS.
  2. Check the aliases of node 5. But FlowSensitive does not show any alias.

Even when I add MAYALIAS query (and any other function calls will work), I will have to traverse the PAG to actually get the new nodes created for function calls (node 19 and 20 in the example above), and then I can finally check they are aliases. But there is still no easy way to know I should run alias(19, 20)...

Qcloud1223 avatar Apr 29 '24 07:04 Qcloud1223

Here is another finding: using -ander will make pts{5} = {13}, even node 5 is not a ValVar. Using -fspta gives a empty pts for node 5.

If node 5 is a top-level pointer, it is fine to query its points-to using pts(5), but if it is an address taken object, you should query using a location id pts(5, loc).

FYI, I'm interested in which object a points to, and I come up with 2 possible way:

  1. Check the PTS of node 5. But FlowSensitive generates an empty PTS.
  2. Check the aliases of node 5. But FlowSensitive does not show any alias.

Even when I add MAYALIAS query (and any other function calls will work), I will have to traverse the PAG to actually get the new nodes created for function calls (node 19 and 20 in the example above), and then I can finally check they are aliases. But there is still no easy way to know I should run alias(19, 20)...

yuleisui avatar Apr 29 '24 09:04 yuleisui

I would suggest a simple way of always querying top-level pointers but not address-taken objects. You could do that when an object is loaded to a pointer so you could query that pointer. In fact, only top-level pointers/registers are used for aliases and queries in real code.

yuleisui avatar Apr 29 '24 10:04 yuleisui

If node 5 is a top-level pointer, it is fine to query its points-to using pts(5)

Here node 5 is in the PAG above, and that does represent a top-level pointer, i.e., int *a in code.

but if it is an address taken object, you should query using a location id pts(5, loc)

Sorry, I did not really get what "location id" is (I guess it's something like context?). As far as I know, performing wpa does not take context as argument when checking pts, since there is only one final result.

I would suggest a simple way of always querying top-level pointers but not address-taken objects.

That is exactly what I did. However, using -fspta on top-level pointers gives an unexpected result:

# manual breakpoint set after PTA is done
$ gdb --args wpa -stat=false -ander global-ptr.bc
(gdb) p _pta->getPts(5).count()
$1 = 1
# top level pointer points to stack variable
(gdb) p *_pta->getPts(5).begin()
$2 = 13

$ gdb --args wpa -stat=false -fspta global-ptr.bc
# top level variable points to nothing
(gdb) p _pta->getPts(5).count()
$1 = 0

I'm expecting -ander and -fspta to give the same result on getPts(5), but they do not.

Sorry if I've mixed things up in previous posts. I hope now the question is a little clearer.

Qcloud1223 avatar Apr 29 '24 12:04 Qcloud1223

Node 5 can't be queried using pts(5) as it is an object which can be defined multiple times at different program points/locations.

You could only use the below APIs to get their pts: getDFInPtsSet getDFOutPtsSet

yuleisui avatar Apr 29 '24 12:04 yuleisui