SVF
SVF copied to clipboard
question: missing edges in ICFG with function pointers
Hi,
I'm currently trying to run SVF on busybox 1.36.1. I update the ICFG with the results of the pta callgraph similar to: https://github.com/SVF-tools/SVF/issues/280#issuecomment-664762712 However, when I dump the PTA callgraph/ICFG e.g. print_s_char
has no incoming edge, although stored in a function pointer via a select statement below:
97109 │ sw.bb59: ; preds = %if.end54
97110 │ %51 = load i32, i32* %fmt, align 4, !dbg !59047, !tbaa !5312
97111 │ %cmp60 = icmp eq i32 %51, 0, !dbg !59049
97112 │ %52 = zext i1 %cmp60 to i64, !dbg !59047
97113 │ %cond = select i1 %cmp60, void (i64, i8*, i8*)* @print_s_char, void (i64, i8*, i8*)* @print_char, !dbg !59047
97114 │ store void (i64, i8*, i8*)* %cond, void (i64, i8*, i8*)** %print_function, align 8, !dbg !59050, !tbaa !5237
97115 │ br label %sw.epilog, !dbg !59051
The code (here od
) is not trivial but to my understanding those functions eventually get called:
https://github.com/mirror/busybox/blob/f15dfd86c4fba78881071dd0f5c63466fa9737a2/coreutils/od_bloaty.c#L930
Can you confirm the issue or is this a bug/misunderstanding on my side?
busybox.bc.gz (LLVM 13)
@jumormt could you take a look at this case?
Hi @251
Thanks for reporting this. I will investigate this case. BTW, can you kindly send me a simplified bitcode or the bitcode for od_bloaty.c?
Hi @jumormt, here is a busybox build that only includes od
: busybox_od_dce.bc.gz
Hi,
I recently came across another example (not sure if related):
#include <stddef.h>
int g = 0;
void foo() { g = 1; }
void boo() { g = 2; }
void hoo() { g = 3; }
void *memcpy(void *destaddr, void const *srcaddr, size_t len) {
char *dest = destaddr;
char const *src = srcaddr;
while (len-- > 0)
*dest++ = *src++;
return destaddr;
}
int main(int argc, char *argv[]) {
if (argc > 2)
return 0;
void (*fn_ptrs[])() = {foo, boo, hoo};
fn_ptrs[argc - 1]();
return g;
}
The issue is, that SVF does not create edges in the ICFG to foo/...
, when the bitcode uses the included memcpy
function to initialise the function pointer array:
if.end: ; preds = %entry
%i1 = bitcast void (...)** %fn_ptr to i8*, !dbg !92
call void @llvm.dbg.declare(metadata void (...)** %fn_ptr, metadata !80, metadata !DIExpression()), !dbg !93
%i2 = bitcast void (...)** %fn_ptr2 to i8*, !dbg !94
call void @llvm.dbg.declare(metadata void (...)** %fn_ptr2, metadata !84, metadata !DIExpression()), !dbg !95
store void (...)* bitcast (void ()* @foo to void (...)*), void (...)** %fn_ptr2, align 8, !dbg !95, !tbaa !46
%i3 = bitcast void (...)** %fn_ptr to i8*, !dbg !96
%i4 = bitcast void (...)** %fn_ptr2 to i8*, !dbg !96
%i5 = call i8* @memcpy(i8* %i3, i8* %i4, i64 8), !dbg !96
%i6 = load void (...)*, void (...)** %fn_ptr, align 8, !dbg !97, !tbaa !46
call void (...) %i6(), !dbg !97
%i7 = load i32, i32* @g, align 4, !dbg !98, !tbaa !16
store i32 %i7, i32* %retval, align 4, !dbg !99
%i8 = bitcast void (...)** %fn_ptr2 to i8*, !dbg !100
%i9 = bitcast void (...)** %fn_ptr to i8*, !dbg !100
br label %return
When it uses the LLVM intrinsic:
if.end: ; preds = %entry
%1 = bitcast [3 x void (...)*]* %fn_ptrs to i8*, !dbg !94
call void @llvm.lifetime.start.p0i8(i64 24, i8* %1) #4, !dbg !94
call void @llvm.dbg.declare(metadata [3 x void (...)*]* %fn_ptrs, metadata !80, metadata !DIExpression()), !dbg !95
%2 = bitcast [3 x void (...)*]* %fn_ptrs to i8*, !dbg !95
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 %2, i8* align 16 bitcast ([3 x void (...)*]* @__const.main.fn_ptrs to i8*), i64 24, i1 false), !dbg !95
%3 = load i32, i32* %argc.addr, align 4, !dbg !96, !tbaa !16
%sub = sub nsw i32 %3, 1, !dbg !97
%idxprom = sext i32 %sub to i64, !dbg !98
%arrayidx = getelementptr inbounds [3 x void (...)*], [3 x void (...)*]* %fn_ptrs, i64 0, i64 %idxprom, !dbg !98
%4 = load void (...)*, void (...)** %arrayidx, align 8, !dbg !98, !tbaa !46
call void (...) %4(), !dbg !98
%5 = load i32, i32* @g, align 4, !dbg !99, !tbaa !16
store i32 %5, i32* %retval, align 4, !dbg !100
%6 = bitcast [3 x void (...)*]* %fn_ptrs to i8*, !dbg !101
call void @llvm.lifetime.end.p0i8(i64 24, i8* %6) #4, !dbg !101
br label %return
it works. How do I fix this? Should I add a memcpy
function to extapi.c
and are there any other functions that could cause such a behaviour (memset
, ...)?
@251
void (*fn_ptrs[])() = {foo, boo, hoo};
is supposed to use LLVM's memcpy:
call void @llvm.memcpy.p0i8.p0i8.i64(...)
Can you provide your clang command?
Hi @shuangxiangkan,
the bitcode is only partially generated with clang followed by some custom passes. To reproduce it, just compile with clang, disassemble (llvm-dis), change the call in an editor and reassemble (llvm-as).
@251
Do you mean manually change the call to llvm.memcpy.p0i8.p0i8.i64
to your custom memcpy
?
If you want to replace llvm.memcpy
with memcpy
, you can replace the __attribute__((annotate("MEMCPY")))
of llvm_memcpy_p0i8_p0i8_i64
in extapi.c
with __attribute__((annotate("OVERWRITE"))
), and then replace the empty body with the body of memcpy.
@shuangxiangkan
Do you mean manually change the call
Only to reproduce the behaviour. In my case most llvm intrinsics are lowered to actual function calls via LLVM passes.
If you want to replace
llvm.memcpy
withmemcpy
Isn't that the wrong way around? I have a call to my memcpy function already and need SVF to treat it as such. Currently it seems to ignore it and does not create ICFG edges.
@251 Could you upload the bc file?
I use clang to complie the example, and IR uses LLVM intrinsic. I'm not clear how it links to your memcpy
?
10: ; preds = %2
%11 = bitcast [3 x void (...)*]* %6 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %11, i8* align 8 bitcast ([3 x void (...)*]* @__const.main.fn_ptrs to i8*), i64 24, i1 false)
%12 = load i32, i32* %4, align 4
%13 = sub nsw i32 %12, 1
%14 = sext i32 %13 to i64
%15 = getelementptr inbounds [3 x void (...)*], [3 x void (...)*]* %6, i64 0, i64 %14
%16 = load void (...)*, void (...)** %15, align 8
%17 = bitcast void (...)* %16 to void ()*
call void %17()
%18 = load i32, i32* @g, align 4
store i32 %18, i32* %3, align 4
br label %19
I'm not clear how it links to your
memcpy
?
As I said: "clang followed by some custom passes". LLVM is a compiler framework, clang is not the only way to produce bitcode.
Could you upload the bc file?
Sure: test.zip
Edit: The odd thing is - even when I rename memcpy
to prefix_memcpy
it cannot resolve the function pointers.
@shuangxiangkan Are you able to reproduce it?
Yes. Are you going to use memcpy
to replace the LLVM intrinsic, and implement the same functionality as the LLVM intrinsic?
In SVF, the handling of the src
and dest
of llvm.memcpy.p0i8.p0i8.i64(void *dest, const void * src, size_t len)
is based on the number of fields, rather than the number of bytes. For example, if a struct has 3 fields, SVF copies over those 3 fields, instead of just copying based on the len
, you can refer to: https://github.com/SVF-tools/SVF/blob/master/svf-llvm/lib/SVFIRExtAPI.cpp#L78-L122
SVF handles llvm.memcpy
in a specialized manner due to its field-index-based memory modeling. Consequently, SVF manages it through hard-coded methods instead of introducing a stub implementation in extapi.c
. In fact, memcpy
is always sensitive to the caller, depending on the input parameters and their object size, making it challenging to model precisely in a static context.
The attempt to overwrite the llvm.memcpy
with your version will not be effective here. It's unclear why there would be a need to replace the LLVM intrinsic version. While you can always call your own memcpy
function, for struct assignments like void (*fn_ptrs[])() = {foo, boo, hoo}
, I recommend retaining the LLVM's llvm.memcpy
, which SVF will handle it correctly.
Hi @shuangxiangkan, @yuleisui,
Thanks for the explanation, I need to check how I can mitigate that.
unclear why there would be a need to replace the LLVM intrinsic version
I'd have to implement a handler for the intrinsic to execute the bitcode with KLEE.
Are there any other functions that would cause similar issues (memset, ...)?
@shuangxiangkan could you point out?
memcpy
, memmove
, mmccpy
, bcopy
, strncpy
, iconv
, memset
have similar issues.
Thanks @shuangxiangkan, for memcpy
, memmove
and memset
I can see how they map to LLVM intrinsics. But how does SVF "interfere" with mmccpy
, bcopy
, iconv
, and strncpy
?