Halide icon indicating copy to clipboard operation
Halide copied to clipboard

Benchmarking local_laplacian segfaults

Open rootjalex opened this issue 4 years ago • 3 comments

Using the adams2019 autoscheduler, on master. The following series of commands are run from apps/local_laplacian:

make clean
make bin/host/local_laplacian.generator

# Make a runtime
./bin/host/local_laplacian.generator -r runtime -o bin/host target=host

c++ -std=c++17 -O3 -c ../../tools/RunGenMain.cpp -o bin/RunGenMain.o -I ../../distrib/include -I /opt/local/include

mkdir -p results

HL_PERMIT_FAILED_UNROLL=1 \
HL_SEED=256 \
HL_RANDOM_DROPOUT=1 \
HL_BEAM_SIZE=1 \
./bin/host/local_laplacian.generator -g local_laplacian -e stmt,static_library,h,assembly,registration,compiler_log,llvm_assembly -o results -p ../../distrib/lib/libautoschedule_adams2019.dylib target=host-no_runtime-disable_llvm_loop_opt auto_schedule=true -s Adams2019

c++ -std=c++17 results/*.{cpp,a} bin/RunGenMain.o bin/host/runtime.a -I ../../distrib/include/ -L/opt/local/lib -ljpeg -lpng -ltiff -lpthread -ldl -o results/benchmark

results/benchmark --benchmark_min_time=0 --track_memory --benchmarks=all --default_input_buffers=random:0:estimate_then_auto --default_input_scalars --output_extents=estimate --parsable_output

Output:

rm -rf bin
c++ -O3 -std=c++17 -I /Users/alexanderroot/Projects/Halide-auto/distrib/include/ -I /Users/alexanderroot/Projects/Halide-auto/distrib/tools/  -Wall -Werror -Wno-unused-function -Wcast-qual -Wignored-qualifiers -Wno-comment -Wsign-compare -Wno-unknown-warning-option -Wno-psabi -stdlib=libc++ -fvisibility=hidden local_laplacian_generator.cpp /Users/alexanderroot/Projects/Halide-auto/distrib/tools/GenGen.cpp -o bin/host/local_laplacian.generator -Wl,-rpath,/Users/alexanderroot/Projects/Halide-auto/distrib/lib/ -L /Users/alexanderroot/Projects/Halide-auto/distrib/lib/ -lHalide -L/usr/local/opt/llvm/lib -ldl -lpthread -lz -Wl,-force_load /Users/alexanderroot/Projects/Halide-auto/distrib/lib/libautoschedule_adams2019.dylib
generate_schedule for target=x86-64-osx-avx-avx2-disable_llvm_loop_opt-f16c-fma-no_runtime-sse41
Pass 0 of 1, cost: 84.0059, time (ms): 4636                                     
Best cost: 84.0059
Cache (block) hits: 0
Cache (block) misses: 977
Warning:
Not folding Func f152 along dimension v1 because there is vectorized access to that Func in that dimension and storage folding was not explicitly requested in the schedule. In previous versions of Halide this would have folded with factor 8. To restore the old behavior add f152.fold_storage(v1, 8) to your schedule.
Warning:
Not folding Func f103 along dimension v1 because there is vectorized access to that Func in that dimension and storage folding was not explicitly requested in the schedule. In previous versions of Halide this would have folded with factor 32. To restore the old behavior add f103.fold_storage(v1, 32) to your schedule.
Warning:
Not folding Func f156 along dimension v1 because there is vectorized access to that Func in that dimension and storage folding was not explicitly requested in the schedule. In previous versions of Halide this would have folded with factor 8. To restore the old behavior add f156.fold_storage(v1, 8) to your schedule.
Warning:
HL_PERMIT_FAILED_UNROLL is allowing us to unroll a non-constant loop into a serial loop. Did you mean to do this?
Warning:
HL_PERMIT_FAILED_UNROLL is allowing us to unroll a non-constant loop into a serial loop. Did you mean to do this?
ld: warning: directory not found for option '-L/opt/local/lib'
Warning: Using --track_memory with --benchmarks will produce inaccurate benchmark results.
./error.sh: line 21: 46537 Segmentation fault: 11  results/benchmark --benchmark_min_time=0 --track_memory --benchmarks=all --default_input_buffers=random:0:estimate_then_auto --default_input_scalars --output_extents=estimate --parsable_output

LLDB output:

(lldb) run
Process 46559 launched: '/Users/alexanderroot/Projects/Halide-auto/apps/local_laplacian/results/benchmark' (x86_64)
Warning: Using --track_memory with --benchmarks will produce inaccurate benchmark results.
Process 46559 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x000000010000ee33 benchmark`local_laplacian.par_for.output.s0.v1.v1 + 6691
benchmark`local_laplacian.par_for.output.s0.v1.v1:
->  0x10000ee33 <+6691>: vmovaps 0x1f28(%rsp), %ymm1
    0x10000ee3c <+6700>: vmovaps 0x1f40(%rsp), %ymm2
    0x10000ee45 <+6709>: vmovaps 0x1f48(%rsp), %ymm3
    0x10000ee4e <+6718>: vshufps $0xdd, %ymm3, %ymm1, %ymm4 ; ymm4 = ymm1[1,3],ymm3[1,3],ymm1[5,7],ymm3[5,7] 
  thread #9, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x000000010000ee33 benchmark`local_laplacian.par_for.output.s0.v1.v1 + 6691
benchmark`local_laplacian.par_for.output.s0.v1.v1:
->  0x10000ee33 <+6691>: vmovaps 0x1f28(%rsp), %ymm1
    0x10000ee3c <+6700>: vmovaps 0x1f40(%rsp), %ymm2
    0x10000ee45 <+6709>: vmovaps 0x1f48(%rsp), %ymm3
    0x10000ee4e <+6718>: vshufps $0xdd, %ymm3, %ymm1, %ymm4 ; ymm4 = ymm1[1,3],ymm3[1,3],ymm1[5,7],ymm3[5,7] 

rootjalex avatar Aug 11 '21 14:08 rootjalex

It's an aligned load from the stack. So either it's a stack overflow, or that address is not aligned. Assuming the stack pointer is aligned, that address is 8-byte aligned, which is not enough for a movaps. So this is a miscompilation. Perhaps we're emitting bad alignment info in CodeGen_LLVM?

abadams avatar Aug 11 '21 14:08 abadams

This could be a bug in modulus_remainder.

abadams avatar Aug 11 '21 14:08 abadams

Is this still active? Does it need investigation?

steven-johnson avatar Jul 12 '22 18:07 steven-johnson