SVF
SVF copied to clipboard
WPA tool crashes with -locMM switch on string handling
I'm trying to build an analysis where pointers stored in an array do not share the same points-to set, to be able to distinguish the objects that may be pointed by each of the pointers in the array. However, enabling the -locMM flag (for which I'm still not sure it would solve my problem but saw #11 ), segfaults the wpa tool during PointerAnalysis initialization, while parsing string accesses.
To reproduce (a minimal poc):
main.c
#include <stdio.h>
int main(int argc, char **argv) {
printf("hello\n");
return 0;
}
then execute:
$ clang -c -fno-discard-value-names -emit-llvm main.c -o main.bc
$ opt -mem2reg main.bc -o main.opt
$ wpa -ander -debug --dwarn=0 --print-all-pts -locMM -stats=0 main.opt
The tool segfaults with the following output:
> $ wpa -ander -debug --dwarn=0 --print-all-pts -locMM -stats=0 main.opt
Args: wpa -ander -debug --dwarn=0 --print-all-pts -locMM -stats=0 main.opt
Building PAG ...
Building Symbol table ...
collect sym from ##@.str = private unnamed_addr constant [7 x i8] c"hello\0A\00", align 1
create a new value sym 4
handle constant expression i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i64 0, i64 0)
create a new value sym 5
[1] 24137 segmentation fault (core dumped) wpa -ander -debug --dwarn=0 --print-all-pts -locMM -stats=0 main.opt
At a quick analysis in a debugger the following stack trace is generated:
[#0] 0x555555636122 → SymbolTableInfo::getStructInfo(this=0x5555566cfbf0, T=0x5555566d1150)
[#1] 0x555555632be1 → SymbolTableInfo::isConstantObjSym(this=0x5555566cfbf0, val=0x5555566d2858)
[#2] 0x5555556323e8 → SymbolTableInfo::collectVal(this=0x5555566cfbf0, val=0x5555566d2858)
[#3] 0x555555632db7 → SymbolTableInfo::handleCE(this=0x5555566cfbf0, val=0x5555566d90c8)
[#4] 0x55555563319b → SymbolTableInfo::handleGlobalCE(this=0x5555566cfbf0, G=0x5555566d2858)
[#5] 0x5555556323cf → SymbolTableInfo::collectVal(this=0x5555566cfbf0, val=0x5555566d2858)
[#6] 0x555555633033 → SymbolTableInfo::handleCE(this=0x5555566cfbf0, val=0x5555566d2858)
[#7] 0x5555556321c8 → SymbolTableInfo::collectSym(this=0x5555566cfbf0, val=0x5555566d2858)
[#8] 0x555555631883 → SymbolTableInfo::buildMemModel(this=0x5555566cfbf0, svfModule=0x5555566c9610)
[#9] 0x5555556a22a1 → PointerAnalysis::initialize(this=0x5555566cc368, svfModule=0x5555566c9610)
And the tool seems to crash in the following statement in lib/SVF-FE/SymbolTableInfo.cpp:
341 ///Get a reference to StructInfo.
342 inline StInfo* getStructInfo(const Type *T) {
→ 343 return getStructInfoIter(T)->second;
344 }
_____________________________________________________________
→ 0x555555636122 <SymbolTableInfo::getStructInfo(llvm::Type+0> mov rax, QWORD PTR [rax+0x8]
Due to getStructInfoIter(T) returning NULL. I suspect Type information not getting properly initialized.
-locMM has not been supported for quite a while after SVF migrated to LLVM-7.0 onwards. We would appreciate you can adapt or help us with this option.
If I understand it correctly, with this option SVF would allow me to treat array items as if they were different fields of a struct. Is that correct? In an array of pointers, I would like to be able to differentiate the points-to sets at least for constant indexes of the array.
I have something like this:
#define M_POINTS 8
void foo() {
unsigned char* M[M_POINTS];
int i = 0;
#pragma unroll(M_POINTS)
for (i=0; i < M_POINTS; ++i) {
M[i] = (unsigned char*)malloc(1024);
if (M[i] == 0) {
return;
}
}
/* operations on the array of pointers */
printf("%p\n", M[0]);
printf("%p\n", M[1]);
return;
}
I have manually set the malloc loop to be unrolled, otherwise, all the objects would be from the same allocation site, and all the pointers would share the same object node if I'm not wrong. I have not only operations on fixed indexes, but also operations striding all the indices of the array of pointers, like:
for (i=0; i < M_POINTS; ++i) {
bar(M[i]);
}
In this example bar then performs memory accesses using the pointer argument.
Is there any way to be able to distinguish also these operations? I don't think so without unrolling the loop and inlining the call, but correct me if I'm wrong.
I have to analyze the possible objects an operation may access, and minimizing the object set would really help. In my use case, I can apply semantic preserving code transformations.
If you could confirm that -locMM would help me to manage these issues, I will be happy to try looking into it and post my findings here. Have you any hint on where to start looking?
Hi @yuleisui do you have any guess on what may have broken the -locMM option starting from LLVM-7.0? And do you have any thoughts on my previous message?
Thank you
Sorry for the late reply. some time to debug. Overloaded recently... The implementation was quite a while ago and was not used or tested since 9.0.0. I will have to find some time to debug if you can not solve it.
The key idea is to get the byte-size of an object. You may wish to take a look at this first: https://github.com/SVF-tools/SVF/blob/master/include/SVF-FE/SymbolTableInfo.h#L451-L503
No problem and thank you for all the work on the project :)
It seems the crash being caused by getStructInfoIter calling collectTypeInfo on an ArrayType, which then calls collectArrayInfo that is an empty function (all code commented out which then does not fill the type info in typeToFieldInfo) for LocSymTableInfo::collectArrayInfo.
It seems the whole function body was commented out in commit a569935b57d8aa80ddad405890aa6736c70b1a07 along with different changes on MemModel.
I'm not sure if the fix would be simply to uncomment the code and port that to use the new methods in the class, or if the commented code was done intentionally to move the locMM logic outside, so should remain commented and the fix to do is somewhere else.
You can uncomment the code and have a try and gradually adapt the code to this new version.
Ok I have a working version of the locMM switch on the last commit now, but before issuing a PR I would like to understand something better:
Looking at the code it seems that the functionality of LocSymTableInfo::collectArrayInfo and LocSymTableInfo::collectStructInfo have been merged into SymbolTableInfo::collectArrayInfo and SymbolTableInfo::collectStructInfo that now fill the FieldInfo of the types both with field indexes and offsets.
Indeed removing the methods from LocSymTableInfo and making it use the parent class ones seems to correctly fill the metadata.
The problem is that in various places in the code the LocationSet::getOffset() method is used, which returns the field index.
Examples of such usage are in PAG::addGepObjNode with:
NodeID gepId = (ls.getOffset() + 1) * gepMultiplier + base;
or in PAG::getGepObjNode with:
// Base and first field are the same memory location.
if (FirstFieldEqBase && ls.getOffset() == 0) return base;
This makes the computations to reason only on the field index. Manually changing such occurrences to LocationSet::getByteOffset() makes the -locMM switch to work, but obviously would break "non locMM" compatibility.
And in fact running the test suite, the following tests fail:
The following tests FAILED:
4 - basic_c_tests/array-constIdx.c (Child aborted)
56 - fs_tests/array_alias_1.c (Child aborted)
57 - fs_tests/array_alias_2.c (Child aborted)
59 - fs_tests/array_alias_4.c (Child aborted)
60 - fs_tests/array_alias_5.c (Child aborted)
As a reference the first one is reported:
#include "aliascheck.h"
struct MyStruct {
int * f1;
int * f2;
};
int main() {
struct MyStruct s[2];
int a,b;
s[0].f1 = &a;
s[1].f1 = &b;
// Different fields of different elements in a
// certain array are treated as different objects.
NOALIAS(s[0].f1, s[1].f2);
MAYALIAS(s[0].f1, s[1].f1);
return 0;
}
which fails on
[AndersenWPA] Checking MAYALIAS
FAILURE :MAYALIAS check <id:289, id:293> at ({ })
This should show that the 0th and 1st entry of the array are treated as a two different fields, as in locMM
Have you any advice on how to deal with that? I could add a method to SymbolTableInfo to provide a boolean locMM state, on which decide if the code should use LocationSet::getByteOffset() or LocationSet::getOffset() but I'm not sure if it is clean enough.
Ok I have a working version of the
locMMswitch on the last commit now, but before issuing a PR I would like to understand something better:Looking at the code it seems that the functionality of
LocSymTableInfo::collectArrayInfoandLocSymTableInfo::collectStructInfohave been merged intoSymbolTableInfo::collectArrayInfoandSymbolTableInfo::collectStructInfothat now fill theFieldInfoof the types both with field indexes and offsets. Indeed removing the methods fromLocSymTableInfoand making it use the parent class ones seems to correctly fill the metadata.The problem is that in various places in the code the
LocationSet::getOffset()method is used, which returns the field index.Examples of such usage are in PAG::addGepObjNode with:
NodeID gepId = (ls.getOffset() + 1) * gepMultiplier + base;or in PAG::getGepObjNode with:
// Base and first field are the same memory location. if (FirstFieldEqBase && ls.getOffset() == 0) return base;This makes the computations to reason only on the field index. Manually changing such occurrences to
LocationSet::getByteOffset()makes the-locMMswitch to work, but obviously would break "non locMM" compatibility.
Yes, this will break the non locMM. One way to do is to still invoke LocationSet::getOffset() but return its byte offset?
And in fact running the test suite, the following tests fail:
The following tests FAILED: 4 - basic_c_tests/array-constIdx.c (Child aborted) 56 - fs_tests/array_alias_1.c (Child aborted) 57 - fs_tests/array_alias_2.c (Child aborted) 59 - fs_tests/array_alias_4.c (Child aborted) 60 - fs_tests/array_alias_5.c (Child aborted)As a reference the first one is reported:
#include "aliascheck.h" struct MyStruct { int * f1; int * f2; }; int main() { struct MyStruct s[2]; int a,b; s[0].f1 = &a; s[1].f1 = &b; // Different fields of different elements in a // certain array are treated as different objects. NOALIAS(s[0].f1, s[1].f2); MAYALIAS(s[0].f1, s[1].f1); return 0; }which fails on
[AndersenWPA] Checking MAYALIAS FAILURE :MAYALIAS check <id:289, id:293> at ({ })This should show that the 0th and 1st entry of the array are treated as a two different fields, as in
locMMHave you any advice on how to deal with that? I could add a method to
SymbolTableInfoto provide a booleanlocMMstate, on which decide if the code should useLocationSet::getByteOffset()orLocationSet::getOffset()but I'm not sure if it is clean enough.
The best solution is to initialized LocationSet::getByteOffset() and LocationSet::getOffset() properly using LocSymTableInfo, however, I guess there is one or two places as you mentioned, only used LocationSet::getOffset() to create GepObjNode, which makes LocMM and non-locMM hard to co-exist. Any idea? @mbarbar @pietroborrello
I was thinking of adding a third method to LocationSet like LocationSet::getByteOrFieldOffset(bool returnByteOff) that returns either the byte or field offset based on the boolean flag, which can be constructed like bool returnByteOff = SVFUtil::isa<LocSymTableInfo>(SymbolTableInfo::Symbolnfo()) but it appears more as an hack to me, than a clean solution
Moreover, I'm not sure if I can blindly substitute any occurrence of LocationSet::getOffset() with the method, or if there is any point in the code which relies on LocationSet::getOffset() regardless of the locMM flag
I was thinking of adding a third method to LocationSet like
LocationSet::getByteOrFieldOffset(bool returnByteOff)that returns either the byte or field offset based on the boolean flag, which can be constructed likebool returnByteOff = SVFUtil::isa<LocSymTableInfo>(SymbolTableInfo::Symbolnfo())but it appears more as an hack to me, than a clean solution
LocSymTableInfo is a child class of SymbolTableInfo, I think it would be good to use polymorphism (making getOffset() as a virtual function) to return the corresponding byteoffset?
LocSymTableInfois a child class ofSymbolTableInfo, I think it would be good to use polymorphism (makinggetOffset()as a virtual function) to return the corresponding byteoffset?
So do you mean creating a child class for LocationSet which returns the byteoffset when LocationSet::getOffset is called? The calls I'm referring to are in the PAG class and not in LocSymTableInfo nor SymbolTableInfo. Also, the problem is that the LocationSet is created outside of the SymbolTableInfo, for example in PAGBuilder::processCE, so would be unaware of which object should create. Maybe he could check the class of SymbolTableInfo::Symbolnfo(). Did I misunderstand something?
LocationSet is used to maintain both field and byteoffset information, I do not think we need a child class of it. Maybe you can try to submit a pull request and we can start discussion from there?