SVF icon indicating copy to clipboard operation
SVF copied to clipboard

question: missing edges in ICFG with function pointers

Open 251 opened this issue 1 year ago • 3 comments

Hi,

I'm currently trying to run SVF on busybox 1.36.1. I update the ICFG with the results of the pta callgraph similar to: https://github.com/SVF-tools/SVF/issues/280#issuecomment-664762712 However, when I dump the PTA callgraph/ICFG e.g. print_s_char has no incoming edge, although stored in a function pointer via a select statement below:

97109   │ sw.bb59:                                          ; preds = %if.end54
97110   │   %51 = load i32, i32* %fmt, align 4, !dbg !59047, !tbaa !5312
97111   │   %cmp60 = icmp eq i32 %51, 0, !dbg !59049
97112   │   %52 = zext i1 %cmp60 to i64, !dbg !59047
97113   │   %cond = select i1 %cmp60, void (i64, i8*, i8*)* @print_s_char, void (i64, i8*, i8*)* @print_char, !dbg !59047
97114   │   store void (i64, i8*, i8*)* %cond, void (i64, i8*, i8*)** %print_function, align 8, !dbg !59050, !tbaa !5237
97115   │   br label %sw.epilog, !dbg !59051

The code (here od) is not trivial but to my understanding those functions eventually get called:

https://github.com/mirror/busybox/blob/f15dfd86c4fba78881071dd0f5c63466fa9737a2/coreutils/od_bloaty.c#L930

Can you confirm the issue or is this a bug/misunderstanding on my side?

busybox.bc.gz (LLVM 13)

251 avatar Oct 04 '23 20:10 251

@jumormt could you take a look at this case?

yuleisui avatar Oct 05 '23 02:10 yuleisui

Hi @251

Thanks for reporting this. I will investigate this case. BTW, can you kindly send me a simplified bitcode or the bitcode for od_bloaty.c?

jumormt avatar Oct 05 '23 03:10 jumormt

Hi @jumormt, here is a busybox build that only includes od: busybox_od_dce.bc.gz

251 avatar Oct 05 '23 09:10 251

Hi,

I recently came across another example (not sure if related):

#include <stddef.h>

int g = 0;

void foo() { g = 1; }
void boo() { g = 2; }
void hoo() { g = 3; }


void *memcpy(void *destaddr, void const *srcaddr, size_t len) {
  char *dest = destaddr;
  char const *src = srcaddr;

  while (len-- > 0)
    *dest++ = *src++;
  return destaddr;
}


int main(int argc, char *argv[]) {
  if (argc > 2)
    return 0;

  void (*fn_ptrs[])() = {foo, boo, hoo};
  fn_ptrs[argc - 1]();

  return g;
}

The issue is, that SVF does not create edges in the ICFG to foo/..., when the bitcode uses the included memcpy function to initialise the function pointer array:

if.end:                                           ; preds = %entry
  %i1 = bitcast void (...)** %fn_ptr to i8*, !dbg !92
  call void @llvm.dbg.declare(metadata void (...)** %fn_ptr, metadata !80, metadata !DIExpression()), !dbg !93
  %i2 = bitcast void (...)** %fn_ptr2 to i8*, !dbg !94
  call void @llvm.dbg.declare(metadata void (...)** %fn_ptr2, metadata !84, metadata !DIExpression()), !dbg !95
  store void (...)* bitcast (void ()* @foo to void (...)*), void (...)** %fn_ptr2, align 8, !dbg !95, !tbaa !46
  %i3 = bitcast void (...)** %fn_ptr to i8*, !dbg !96
  %i4 = bitcast void (...)** %fn_ptr2 to i8*, !dbg !96
  %i5 = call i8* @memcpy(i8* %i3, i8* %i4, i64 8), !dbg !96
  %i6 = load void (...)*, void (...)** %fn_ptr, align 8, !dbg !97, !tbaa !46
  call void (...) %i6(), !dbg !97
  %i7 = load i32, i32* @g, align 4, !dbg !98, !tbaa !16
  store i32 %i7, i32* %retval, align 4, !dbg !99
  %i8 = bitcast void (...)** %fn_ptr2 to i8*, !dbg !100
  %i9 = bitcast void (...)** %fn_ptr to i8*, !dbg !100
  br label %return

When it uses the LLVM intrinsic:

if.end:                                           ; preds = %entry
  %1 = bitcast [3 x void (...)*]* %fn_ptrs to i8*, !dbg !94
  call void @llvm.lifetime.start.p0i8(i64 24, i8* %1) #4, !dbg !94
  call void @llvm.dbg.declare(metadata [3 x void (...)*]* %fn_ptrs, metadata !80, metadata !DIExpression()), !dbg !95
  %2 = bitcast [3 x void (...)*]* %fn_ptrs to i8*, !dbg !95
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 %2, i8* align 16 bitcast ([3 x void (...)*]* @__const.main.fn_ptrs to i8*), i64 24, i1 false), !dbg !95
  %3 = load i32, i32* %argc.addr, align 4, !dbg !96, !tbaa !16
  %sub = sub nsw i32 %3, 1, !dbg !97
  %idxprom = sext i32 %sub to i64, !dbg !98
  %arrayidx = getelementptr inbounds [3 x void (...)*], [3 x void (...)*]* %fn_ptrs, i64 0, i64 %idxprom, !dbg !98
  %4 = load void (...)*, void (...)** %arrayidx, align 8, !dbg !98, !tbaa !46
  call void (...) %4(), !dbg !98
  %5 = load i32, i32* @g, align 4, !dbg !99, !tbaa !16
  store i32 %5, i32* %retval, align 4, !dbg !100
  %6 = bitcast [3 x void (...)*]* %fn_ptrs to i8*, !dbg !101
  call void @llvm.lifetime.end.p0i8(i64 24, i8* %6) #4, !dbg !101
  br label %return

it works. How do I fix this? Should I add a memcpy function to extapi.c and are there any other functions that could cause such a behaviour (memset, ...)?

251 avatar Mar 05 '24 16:03 251

@251 void (*fn_ptrs[])() = {foo, boo, hoo}; is supposed to use LLVM's memcpy: call void @llvm.memcpy.p0i8.p0i8.i64(...) Can you provide your clang command?

shuangxiangkan avatar Mar 06 '24 01:03 shuangxiangkan

Hi @shuangxiangkan,

the bitcode is only partially generated with clang followed by some custom passes. To reproduce it, just compile with clang, disassemble (llvm-dis), change the call in an editor and reassemble (llvm-as).

251 avatar Mar 06 '24 09:03 251

@251 Do you mean manually change the call to llvm.memcpy.p0i8.p0i8.i64 to your custom memcpy?

If you want to replace llvm.memcpy with memcpy, you can replace the __attribute__((annotate("MEMCPY"))) of llvm_memcpy_p0i8_p0i8_i64 in extapi.c with __attribute__((annotate("OVERWRITE"))), and then replace the empty body with the body of memcpy.

shuangxiangkan avatar Mar 06 '24 23:03 shuangxiangkan

@shuangxiangkan

Do you mean manually change the call

Only to reproduce the behaviour. In my case most llvm intrinsics are lowered to actual function calls via LLVM passes.

If you want to replace llvm.memcpy with memcpy

Isn't that the wrong way around? I have a call to my memcpy function already and need SVF to treat it as such. Currently it seems to ignore it and does not create ICFG edges.

251 avatar Mar 07 '24 00:03 251

@251 Could you upload the bc file?

shuangxiangkan avatar Mar 07 '24 00:03 shuangxiangkan

I use clang to complie the example, and IR uses LLVM intrinsic. I'm not clear how it links to your memcpy?

10:                                               ; preds = %2
  %11 = bitcast [3 x void (...)*]* %6 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %11, i8* align 8 bitcast ([3 x void (...)*]* @__const.main.fn_ptrs to i8*), i64 24, i1 false)
  %12 = load i32, i32* %4, align 4
  %13 = sub nsw i32 %12, 1
  %14 = sext i32 %13 to i64
  %15 = getelementptr inbounds [3 x void (...)*], [3 x void (...)*]* %6, i64 0, i64 %14
  %16 = load void (...)*, void (...)** %15, align 8
  %17 = bitcast void (...)* %16 to void ()*
  call void %17()
  %18 = load i32, i32* @g, align 4
  store i32 %18, i32* %3, align 4
  br label %19

shuangxiangkan avatar Mar 07 '24 01:03 shuangxiangkan

I'm not clear how it links to your memcpy?

As I said: "clang followed by some custom passes". LLVM is a compiler framework, clang is not the only way to produce bitcode.

Could you upload the bc file?

Sure: test.zip

Edit: The odd thing is - even when I rename memcpy to prefix_memcpy it cannot resolve the function pointers.

251 avatar Mar 07 '24 11:03 251

@shuangxiangkan Are you able to reproduce it?

251 avatar Mar 08 '24 15:03 251

Yes. Are you going to use memcpy to replace the LLVM intrinsic, and implement the same functionality as the LLVM intrinsic?

shuangxiangkan avatar Mar 09 '24 00:03 shuangxiangkan

In SVF, the handling of the src and dest of llvm.memcpy.p0i8.p0i8.i64(void *dest, const void * src, size_t len) is based on the number of fields, rather than the number of bytes. For example, if a struct has 3 fields, SVF copies over those 3 fields, instead of just copying based on the len, you can refer to: https://github.com/SVF-tools/SVF/blob/master/svf-llvm/lib/SVFIRExtAPI.cpp#L78-L122

shuangxiangkan avatar Mar 09 '24 01:03 shuangxiangkan

SVF handles llvm.memcpy in a specialized manner due to its field-index-based memory modeling. Consequently, SVF manages it through hard-coded methods instead of introducing a stub implementation in extapi.c. In fact, memcpy is always sensitive to the caller, depending on the input parameters and their object size, making it challenging to model precisely in a static context.

The attempt to overwrite the llvm.memcpy with your version will not be effective here. It's unclear why there would be a need to replace the LLVM intrinsic version. While you can always call your own memcpy function, for struct assignments like void (*fn_ptrs[])() = {foo, boo, hoo}, I recommend retaining the LLVM's llvm.memcpy, which SVF will handle it correctly.

yuleisui avatar Mar 09 '24 08:03 yuleisui

Hi @shuangxiangkan, @yuleisui,

Thanks for the explanation, I need to check how I can mitigate that.

unclear why there would be a need to replace the LLVM intrinsic version

I'd have to implement a handler for the intrinsic to execute the bitcode with KLEE.

Are there any other functions that would cause similar issues (memset, ...)?

251 avatar Mar 11 '24 14:03 251

@shuangxiangkan could you point out?

yuleisui avatar Mar 11 '24 21:03 yuleisui

memcpy, memmove, mmccpy, bcopy, strncpy, iconv, memset have similar issues.

shuangxiangkan avatar Mar 12 '24 08:03 shuangxiangkan

Thanks @shuangxiangkan, for memcpy, memmove and memset I can see how they map to LLVM intrinsics. But how does SVF "interfere" with mmccpy, bcopy, iconv, and strncpy?

251 avatar Mar 12 '24 10:03 251