Flow sensitive wpa misses alias of global pointer?
Hi,
I'm having unexpected results when analyzing a complicated program, and the problem boils down to a seemingly simple issue. Consider this very simple program:
int *a;
int main()
{
int b = 0;
a = &b;
return 0;
}
And the (related part of) PAG is also straightforward:
Andersen's analysis shows *a is an alias of b:
$ wpa -stat=false -ander -print-aliases global-ptr.bc
...
MayAlias var5[a (base object)@] -- var12[@main]
while flow sensitive says no:
$ wpa -stat=false -fspta -print-aliases global-ptr.bc
...
NoAlias var5[a (base object)@] -- var12[@main]
I'm having a hard time understanding this. Though we are doing whole program analysis, a = &b will eventually be executed, so flow sensitive analysis should not overlook it. I believe such behavior can only happen when feeding context prior to a = &b, to DDA. Is this indeed unexpected behavior or I'm just mixing concepts up?
Plus, is there a simple way to find all aliases of a value? Currently SVF has nice and clean interface to get pts and revPts of a value, but the only interface I find for alias checking is alias(node1, node2), which is used to traverse all PAG nodes to find all aliases.
It works for me for both analyses (-fspta and -ander). You could try the below code:
extern void MAYALIAS(void*,void*);
int *a;
int main()
{
int b = 0;
a = &b;
MAYALIAS(a,&b);
return 0;
}
clang -S -c -emit-llvm ex.c -o ex.ll wpa -fspta ex.ll
[FlowSensitive] Checking MAYALIAS
SUCCESS :MAYALIAS check <id:18, id:12> at ()
Thanks for your reply! I'm able to reproduce this, and the resulting PAG is here:
WPA says:
[FlowSensitive] Checking _Z8MAYALIASPvS_
SUCCESS :_Z8MAYALIASPvS_ check <id:19, id:20> at ()
I can see that node 19 and node 20 is created as alias of a and &b, and SVF says they are aliases.
I'm wondering why SVF needs this to work. Also, can I analyze this program without modifying its source code?
Here is another finding: using -ander will make pts{5} = {13}, even node 5 is not a ValVar. Using -fspta gives a empty pts for node 5.
FYI, I'm interested in which object a points to, and I come up with 2 possible way:
- Check the PTS of node 5. But
FlowSensitivegenerates an empty PTS. - Check the aliases of node 5. But
FlowSensitivedoes not show any alias.
Even when I add MAYALIAS query (and any other function calls will work), I will have to traverse the PAG to actually get the new nodes created for function calls (node 19 and 20 in the example above), and then I can finally check they are aliases. But there is still no easy way to know I should run alias(19, 20)...
Here is another finding: using
-anderwill make pts{5} = {13}, even node 5 is not a ValVar. Using-fsptagives a empty pts for node 5.
If node 5 is a top-level pointer, it is fine to query its points-to using pts(5), but if it is an address taken object, you should query using a location id pts(5, loc).
FYI, I'm interested in which object
apoints to, and I come up with 2 possible way:
- Check the PTS of node 5. But
FlowSensitivegenerates an empty PTS.- Check the aliases of node 5. But
FlowSensitivedoes not show any alias.Even when I add
MAYALIASquery (and any other function calls will work), I will have to traverse the PAG to actually get the new nodes created for function calls (node 19 and 20 in the example above), and then I can finally check they are aliases. But there is still no easy way to know I should runalias(19, 20)...
I would suggest a simple way of always querying top-level pointers but not address-taken objects. You could do that when an object is loaded to a pointer so you could query that pointer. In fact, only top-level pointers/registers are used for aliases and queries in real code.
If node 5 is a top-level pointer, it is fine to query its points-to using pts(5)
Here node 5 is in the PAG above, and that does represent a top-level pointer, i.e., int *a in code.
but if it is an address taken object, you should query using a location id pts(5, loc)
Sorry, I did not really get what "location id" is (I guess it's something like context?). As far as I know, performing wpa does not take context as argument when checking pts, since there is only one final result.
I would suggest a simple way of always querying top-level pointers but not address-taken objects.
That is exactly what I did. However, using -fspta on top-level pointers gives an unexpected result:
# manual breakpoint set after PTA is done
$ gdb --args wpa -stat=false -ander global-ptr.bc
(gdb) p _pta->getPts(5).count()
$1 = 1
# top level pointer points to stack variable
(gdb) p *_pta->getPts(5).begin()
$2 = 13
$ gdb --args wpa -stat=false -fspta global-ptr.bc
# top level variable points to nothing
(gdb) p _pta->getPts(5).count()
$1 = 0
I'm expecting -ander and -fspta to give the same result on getPts(5), but they do not.
Sorry if I've mixed things up in previous posts. I hope now the question is a little clearer.
Node 5 can't be queried using pts(5) as it is an object which can be defined multiple times at different program points/locations.
You could only use the below APIs to get their pts: getDFInPtsSet getDFOutPtsSet