SVF icon indicating copy to clipboard operation
SVF copied to clipboard

AddressSanitizer + struct allocated on stack with function pointer field

Open acidghost opened this issue 2 years ago • 9 comments

While doing some experiments on a program compiled with AddressSanitizer (ASan) I discovered that some supposedly trivial function pointers had an empty points-to set. Here's a minimal reproducer:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>

struct mystruct {
    void (*fn)();
};

void indir1()
{
    puts("indir1");
}

void indir2()
{
    puts("indir2");
}

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
{
    if (size < 2) {
        return 0;
    }

    struct mystruct s;

    void (*fn)() = NULL;
    if (data[0] < 42) {
        fn = indir1;
    } else {
        fn = indir2;
    }

    s.fn = fn;
    s.fn();

    return 0;
}

I compiled it with LLVM 14.0.6 clang -emit-llvm -O0 -g -c -fsanitize=address -fsanitize-address-use-after-return=never (https://godbolt.org/z/zMb3qn47M).

Running Andersen's PTA with wpa -ander -print-fp shows no targets for the indirect call site s.fn().

I think that the problem lies in the fact that ASan transforms the stack allocation of the struct from

  %s = alloca %struct.mystruct, align 8

to

  %MyAlloca = alloca i8, i64 64, align 32
  %0 = ptrtoint i8* %MyAlloca to i64
  %1 = add i64 %0, 32
  %2 = inttoptr i64 %1 to %struct.mystruct*

Is this related to #867?

I've tried allocating the struct on the heap: ASan does not perform the same transformation and PTA can find targets to the call site.

Any pointer to how this could be handled is much appreciated.

acidghost avatar Aug 30 '23 21:08 acidghost

Thanks for reporting this. SVF currently does not support int2ptr and ptr2int during pointer analysis. We may support it later as this feature itself is a hard research topic if we would like the analysis to be sound. One possible solution for your case is to "rewrite" your asan instrumentation instructions by replacing ptr2int and int2ptr manually with some bitcast instructions before sending the bc to SVF, that is simply using CopyStmt so that the assignments will reflect on the constraint graph.

yuleisui avatar Aug 30 '23 23:08 yuleisui

Thank you for your help. I did some experiments by manually editing the LLVM IR but I got stuck.

I initially tried to replace the pointer arithmentic introduced by ASan:

  %MyAlloca = alloca i8, i64 64, align 32
  %0 = ptrtoint i8* %MyAlloca to i64
  %1 = add i64 %0, 32
  %2 = inttoptr i64 %1 to %struct.mystruct*

into

  %MyAlloca = alloca i8, i64 64, align 32
  %0 = getelementptr i8, i8* %MyAlloca, i64 0, i64 32
  %1 = bitcast i8* %0 to %struct.mystruct*

but SVF would not find targets for the indirect call site.

I tried to simplify it even further and removed the GEP (leaving just an alloca followed by a bitcast) but it didn't work either.

I have inspected the SVFIR for both my modified version and the original version (w/o ASan or my changes, as posted in the first comment) which seem correct. I did the same also for the constraint graphs:

  • the initial graphs look identical, besides for the copy edge introduced by the additional bitcast in the modified version
  • the final graph for the version that works (i.e. original) adds a node that links %3 = load ... to %4 = load ... (ref. code below)

Left is modified version, right is original: Screenshot 2023-09-05 at 23 32 54

As I understand it, the introduction of an additional copy edge (60 -> 63 left graph above) should not matter much and I would have expected the same results as the original version. What am I missing?

The final modified code is:

%struct.mystruct = type { void (...)* }
define void @indir1() #0 !dbg !13 { ... }
define void @indir2() #0 !dbg !19 { ... }

define i32 @LLVMFuzzerTestOneInput(i8* noundef %data, i64 noundef %size) #0 !dbg !22 {
entry:
  %retval = alloca i32, align 4
  %data.addr = alloca i8*, align 8
  %size.addr = alloca i64, align 8
  ; %MyAlloca = alloca [64 x i8], align 32
  %MyAlloca = alloca i8, i64 64, align 32
  ; %MyAlloca.gep = getelementptr inbounds [64 x i8], [64 x i8]* %MyAlloca, i64 0, i64 32
  ; %s = alloca %struct.mystruct, align 8
  ; %s = bitcast i8* %MyAlloca.gep to %struct.mystruct*
  ; %s = bitcast [64 x i8]* %MyAlloca to %struct.mystruct*
  %s = bitcast i8* %MyAlloca to %struct.mystruct*
  %fn = alloca void (...)*, align 8
  store i8* %data, i8** %data.addr, align 8
  call void @llvm.dbg.declare(metadata i8** %data.addr, metadata !34, metadata !DIExpression()), !dbg !35
  store i64 %size, i64* %size.addr, align 8
  call void @llvm.dbg.declare(metadata i64* %size.addr, metadata !36, metadata !DIExpression()), !dbg !37
  %0 = load i64, i64* %size.addr, align 8, !dbg !38
  %cmp = icmp ult i64 %0, 2, !dbg !40
  br i1 %cmp, label %if.then, label %if.end, !dbg !41

if.then:                                          ; preds = %entry
  store i32 0, i32* %retval, align 4, !dbg !42
  br label %return, !dbg !42

if.end:                                           ; preds = %entry
  call void @llvm.dbg.declare(metadata %struct.mystruct* %s, metadata !44, metadata !DIExpression()), !dbg !51
  call void @llvm.dbg.declare(metadata void (...)** %fn, metadata !52, metadata !DIExpression()), !dbg !53
  store void (...)* null, void (...)** %fn, align 8, !dbg !53
  %1 = load i8*, i8** %data.addr, align 8, !dbg !54
  %arrayidx = getelementptr inbounds i8, i8* %1, i64 0, !dbg !54
  %2 = load i8, i8* %arrayidx, align 1, !dbg !54
  %conv = zext i8 %2 to i32, !dbg !54
  %cmp1 = icmp slt i32 %conv, 42, !dbg !56
  br i1 %cmp1, label %if.then3, label %if.else, !dbg !57

if.then3:                                         ; preds = %if.end
  store void (...)* bitcast (void ()* @indir1 to void (...)*), void (...)** %fn, align 8, !dbg !58
  br label %if.end4, !dbg !60

if.else:                                          ; preds = %if.end
  store void (...)* bitcast (void ()* @indir2 to void (...)*), void (...)** %fn, align 8, !dbg !61
  br label %if.end4

if.end4:                                          ; preds = %if.else, %if.then3
  %3 = load void (...)*, void (...)** %fn, align 8, !dbg !63
  %fn5 = getelementptr inbounds %struct.mystruct, %struct.mystruct* %s, i32 0, i32 0, !dbg !64
  store void (...)* %3, void (...)** %fn5, align 8, !dbg !65
  %fn6 = getelementptr inbounds %struct.mystruct, %struct.mystruct* %s, i32 0, i32 0, !dbg !66
  %4 = load void (...)*, void (...)** %fn6, align 8, !dbg !66
  %callee.knr.cast = bitcast void (...)* %4 to void ()*, !dbg !67
  call void %callee.knr.cast(), !dbg !67
  store i32 0, i32* %retval, align 4, !dbg !68
  br label %return, !dbg !68

return:                                           ; preds = %if.end4, %if.then
  %5 = load i32, i32* %retval, align 4, !dbg !69
  ret i32 %5, !dbg !69
}

I can provide additional material (graphs, code, etc.) if requested.

acidghost avatar Sep 05 '23 21:09 acidghost

@xudon9 could you take a look at this issue?

yuleisui avatar Sep 06 '23 09:09 yuleisui

@yuleisui @xudon9 Any update on the issue?

acidghost avatar Sep 19 '23 06:09 acidghost

Hi, did you eliminate all ptr2int/int2ptr instructions manually? I managed to reproduce the problem with an even simpler sample below, where foo() contains more than 15 integer/pointer casts.

struct S { void (*fn)(); };
void indir1() {}
void foo()
{
    struct S s;
    s.fn = indir1;
    s.fn();
}

Can you upload an IR file compiled from this snippet, with ASan turned on, without int2ptr/ptr2int?

xudon9 avatar Sep 19 '23 12:09 xudon9

Hi, thanks for the reply.

My initial intention was to modify the IR output by ASan so I tried to manually replace the inttoptr and ptrtoint instructions with a GEP and a bitcast. Because this was unsuccessful, to try and reduce the scope of the IR I tried to modify the initial code compiled w/o ASan by:

  • replacing the alloca for the struct with an alloca as ASan would have done
  • adding GEP and bitcast

This was also unsuccessful. Same story if I just replace the alloca for the struct with an alloca of some bytes and a bitcast.

So I'm not sure what you want me to provide:

  • snippet compiled with ASan but with the relevant pointer arithmetic done with GEPs?
  • snippet compiled w/o ASan but with alloca for the struct replaced with an alloca of N bytes?

acidghost avatar Sep 19 '23 12:09 acidghost

Let's try this first.

  • snippet compiled w/o ASan but with alloca for the struct replaced with an alloca of N bytes?

xudon9 avatar Sep 19 '23 12:09 xudon9

  • snippet compiled w/o ASan but with alloca for the struct replaced with an alloca of N bytes?
; ModuleID = 'gh-issue-noasan.c'
source_filename = "gh-issue-noasan.c"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-macosx13.0.0"

%struct.S = type { void (...)* }

; Function Attrs: noinline nounwind optnone ssp uwtable
define void @indir1() #0 !dbg !13 {
entry:
  ret void, !dbg !17
}

; Function Attrs: noinline nounwind optnone ssp uwtable
define void @foo() #0 !dbg !18 {
entry:
  ; %s = alloca %struct.S, align 8
  %MyAlloca = alloca i8, i64 64, align 32
  %s = bitcast i8* %MyAlloca to %struct.S*
  call void @llvm.dbg.declare(metadata %struct.S* %s, metadata !19, metadata !DIExpression()), !dbg !26
  %fn = getelementptr inbounds %struct.S, %struct.S* %s, i32 0, i32 0, !dbg !27
  store void (...)* bitcast (void ()* @indir1 to void (...)*), void (...)** %fn, align 8, !dbg !28
  %fn1 = getelementptr inbounds %struct.S, %struct.S* %s, i32 0, i32 0, !dbg !29
  %0 = load void (...)*, void (...)** %fn1, align 8, !dbg !29
  %callee.knr.cast = bitcast void (...)* %0 to void ()*, !dbg !30
  call void %callee.knr.cast(), !dbg !30
  ret void, !dbg !31
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

attributes #0 = { noinline nounwind optnone ssp uwtable "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+crc,+crypto,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+v8.5a,+zcm,+zcz" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2, !3, !4, !5, !6, !7, !8, !9, !10, !11}
!llvm.ident = !{!12}

!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None, sysroot: "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk", sdk: "MacOSX.sdk")
!1 = !DIFile(filename: "gh-issue-noasan.c", directory: "/Users/acidghost/wa/oss/SVF")
!2 = !{i32 7, !"Dwarf Version", i32 4}
!3 = !{i32 2, !"Debug Info Version", i32 3}
!4 = !{i32 1, !"wchar_size", i32 4}
!5 = !{i32 1, !"branch-target-enforcement", i32 0}
!6 = !{i32 1, !"sign-return-address", i32 0}
!7 = !{i32 1, !"sign-return-address-all", i32 0}
!8 = !{i32 1, !"sign-return-address-with-bkey", i32 0}
!9 = !{i32 7, !"PIC Level", i32 2}
!10 = !{i32 7, !"uwtable", i32 1}
!11 = !{i32 7, !"frame-pointer", i32 1}
!12 = !{!"clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1)"}
!13 = distinct !DISubprogram(name: "indir1", scope: !1, file: !1, line: 2, type: !14, scopeLine: 2, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !16)
!14 = !DISubroutineType(types: !15)
!15 = !{null}
!16 = !{}
!17 = !DILocation(line: 2, column: 16, scope: !13)
!18 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 3, type: !14, scopeLine: 4, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !16)
!19 = !DILocalVariable(name: "s", scope: !18, file: !1, line: 5, type: !20)
!20 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "S", file: !1, line: 1, size: 64, elements: !21)
!21 = !{!22}
!22 = !DIDerivedType(tag: DW_TAG_member, name: "fn", scope: !20, file: !1, line: 1, baseType: !23, size: 64)
!23 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !24, size: 64)
!24 = !DISubroutineType(types: !25)
!25 = !{null, null}
!26 = !DILocation(line: 5, column: 14, scope: !18)
!27 = !DILocation(line: 6, column: 7, scope: !18)
!28 = !DILocation(line: 6, column: 10, scope: !18)
!29 = !DILocation(line: 7, column: 7, scope: !18)
!30 = !DILocation(line: 7, column: 5, scope: !18)
!31 = !DILocation(line: 8, column: 1, scope: !18)
==================Function Pointer Targets==================

NodeID: 44
CallSite:    call void %callee.knr.cast(), !dbg !28 { "ln": 7, "cl": 5, "fl": "gh-issue-noasan.c" }     Location: { "ln": 7, "cl": 5, "fl": "gh-issue-noasan.c" }
        !!!has no targets!!!

acidghost avatar Sep 19 '23 12:09 acidghost

@yuleisui @xudon9 Any update on the issue?

acidghost avatar Nov 10 '23 06:11 acidghost