llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

mlir-cpu-runner/async-group.mlir fails and freeze the test suite

Open sylvestre opened this issue 3 years ago • 1 comments

log: https://llvm-jenkins.debian.net/job/llvm-toolchain-binaries/architecture=i386,distribution=unstable,label=i386/680/console

Testing: 0  2  4  6  8  10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 
FAIL: MLIR :: mlir-cpu-runner/async-group.mlir (1614 of 1617)
******************** TEST 'MLIR :: mlir-cpu-runner/async-group.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1';     /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-opt /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir -pass-pipeline="async-to-async-runtime,func.func(async-runtime-ref-counting,async-runtime-ref-counting-opt),convert-async-to-llvm,func.func(convert-arith-to-llvm),convert-func-to-llvm,reconcile-unrealized-casts"  | /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner                                                           -e main -entry-point-result=void -O0                                    -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_c_runner_utils.so       -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_runner_utils.so         -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_async_runtime.so    | /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/FileCheck /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir
--
Exit Code: 2

Command Output (stderr):
--
free(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner -e main -entry-point-result=void -O0 -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_c_runner_utils.so -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_runner_utils.so -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_async_runtime.so
 #0 0xf0e19e71 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:569:13
 #1 0xf0e1a0f0 PrintStackTraceSignalHandler(void*) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:635:3
 #2 0xf0e17d8c llvm::sys::RunSignalHandlers() build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Signals.cpp:104:20
 #3 0xf0e1a42d SignalHandler(int) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:0:3
 #4 0xf7f89570 (linux-gate.so.1+0x570)
 #5 0xf7f89559 (linux-gate.so.1+0x559)
 #6 0xefffcec7 (/lib/i386-linux-gnu/libc.so.6+0x85ec7)
 #7 0xeffadb41 raise (/lib/i386-linux-gnu/libc.so.6+0x36b41)
 #8 0xeff97262 abort (/lib/i386-linux-gnu/libc.so.6+0x20262)
 #9 0xeffefc6c (/lib/i386-linux-gnu/libc.so.6+0x78c6c)
#10 0xf000837d (/lib/i386-linux-gnu/libc.so.6+0x9137d)
#11 0xf0009e53 (/lib/i386-linux-gnu/libc.so.6+0x92e53)
#12 0xf000c802 cfree (/lib/i386-linux-gnu/libc.so.6+0x95802)
#13 0xf0359818 operator delete(void*) (/lib/i386-linux-gnu/libstdc++.so.6+0x88818)
#14 0xead7afda mlir::runtime::AsyncToken::~AsyncToken() build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:173:8
#15 0xead7b04c mlir::runtime::(anonymous namespace)::RefCounted::destroy() build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:149:41
#16 0xead792c3 mlirAsyncRuntimeDropRef build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:237:1
#17 0xf7f7e0a8 
#18 0xf7f7e4a8 
#19 0x56662543 compileAndExecute((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, void**) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:250:3
#20 0x5665ed72 compileAndExecuteVoidFunction((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:267:10
#21 0x5665d9b0 mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:402:23
#22 0x565c3aa0 main build-llvm/tools/clang/stage2-bins/mlir/tools/mlir-cpu-runner/mlir-cpu-runner.
Testing: 0  2  4  6  8  10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 
FAIL: MLIR :: mlir-cpu-runner/async-group.mlir (1614 of 1617)
******************** TEST 'MLIR :: mlir-cpu-runner/async-group.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1';     /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-opt /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir -pass-pipeline="async-to-async-runtime,func.func(async-runtime-ref-counting,async-runtime-ref-counting-opt),convert-async-to-llvm,func.func(convert-arith-to-llvm),convert-func-to-llvm,reconcile-unrealized-casts"  | /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner                                                           -e main -entry-point-result=void -O0                                    -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_c_runner_utils.so       -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_runner_utils.so         -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_async_runtime.so    | /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/FileCheck /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir
--
Exit Code: 2

Command Output (stderr):
--
free(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner -e main -entry-point-result=void -O0 -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_c_runner_utils.so -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_runner_utils.so -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_async_runtime.so
 #0 0xf0e19e71 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:569:13
 #1 0xf0e1a0f0 PrintStackTraceSignalHandler(void*) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:635:3
 #2 0xf0e17d8c llvm::sys::RunSignalHandlers() build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Signals.cpp:104:20
 #3 0xf0e1a42d SignalHandler(int) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:0:3
 #4 0xf7f89570 (linux-gate.so.1+0x570)
 #5 0xf7f89559 (linux-gate.so.1+0x559)
 #6 0xefffcec7 (/lib/i386-linux-gnu/libc.so.6+0x85ec7)
 #7 0xeffadb41 raise (/lib/i386-linux-gnu/libc.so.6+0x36b41)
 #8 0xeff97262 abort (/lib/i386-linux-gnu/libc.so.6+0x20262)
 #9 0xeffefc6c (/lib/i386-linux-gnu/libc.so.6+0x78c6c)
#10 0xf000837d (/lib/i386-linux-gnu/libc.so.6+0x9137d)
#11 0xf0009e53 (/lib/i386-linux-gnu/libc.so.6+0x92e53)
#12 0xf000c802 cfree (/lib/i386-linux-gnu/libc.so.6+0x95802)
#13 0xf0359818 operator delete(void*) (/lib/i386-linux-gnu/libstdc++.so.6+0x88818)
#14 0xead7afda mlir::runtime::AsyncToken::~AsyncToken() build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:173:8
#15 0xead7b04c mlir::runtime::(anonymous namespace)::RefCounted::destroy() build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:149:41
#16 0xead792c3 mlirAsyncRuntimeDropRef build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:237:1
#17 0xf7f7e0a8 
#18 0xf7f7e4a8 
#19 0x56662543 compileAndExecute((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, void**) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:250:3
#20 0x5665ed72 compileAndExecuteVoidFunction((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:267:10
#21 0x5665d9b0 mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:402:23
#22 0x565c3aa0 main build-llvm/tools/clang/stage2-bins/mlir/tools/mlir-cpu-runner/mlir-cpu-runner.cpp:0:10
#23 0xeff983b5 (/lib/i386-linux-gnu/libc.so.6+0x213b5)
#24 0xeff9847f __libc_start_main (/lib/i386-linux-gnu/libc.so.6+0x2147f)
#25 0x565c3877 _start (/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner+0x16877)
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/FileCheck /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir
cpp:0:10
#23 0xeff983b5 (/lib/i386-linux-gnu/libc.so.6+0x213b5)
#24 0xeff9847f __libc_start_main (/lib/i386-linux-gnu/libc.so.6+0x2147f)
#25 0x565c3877 _start (/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner+0x16877)
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/FileCheck /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir

It freezes the execution of the test suite.

(I am not 100% that it is this test causing the failure of the testsuite)

sylvestre avatar Oct 14 '22 12:10 sylvestre

@llvm/issue-subscribers-mlir

llvmbot avatar Oct 14 '22 12:10 llvmbot

I've seen this i686 failure as well. We currently don't ship mlir tools in fedora because of these failures (they don't occur when not building tools).

nikic avatar Oct 19 '22 12:10 nikic

I disabled the testsuite on i386 to avoid this

sylvestre avatar Nov 11 '22 16:11 sylvestre

I took a brief look at this, and in the generated MLIR (presumably the done by the convert-memref-to-llvm pass) I already see things like this:

  llvm.func @malloc(i64) -> !llvm.ptr<i8>
  llvm.func @free(!llvm.ptr<i8>)
  llvm.func @aligned_alloc(i64, i64) -> !llvm.ptr<i8>

So it seems like at least this part of MLIR has a hardcoded assumption that it runs on a 64-bit architecture.

nikic avatar Dec 22 '22 17:12 nikic

Okay, apparently MLIR has a concept of an "index type" that should handle this. The memref dialect does respect the index type, e.g. here: https://github.com/llvm/llvm-project/blob/de8e0a439777014d7d85007c379579e58bba2efe/mlir/lib/Conversion/MemRefToLLVM/AllocLikeConversion.cpp#L126

The async dialect hardcodes i64 for all sizes: https://github.com/llvm/llvm-project/blob/de8e0a439777014d7d85007c379579e58bba2efe/mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp#L379-L381

Something I don't get yet is how the index type is determined. It looks like even the malloc created by memref also uses i64 on i686. I'd have expected it to use i32.

nikic avatar Dec 23 '22 14:12 nikic

Looks like the index type is part of LowerToLLVMOptions and determined either from datalayout or an index bitwidth override.

But how is a generic mlir-opt call that is intended for use with mlir-cpu-runner to know the right option for the target? I don't see any obvious way it could use the host index width -- and even manually passing it in seems like a big hassle, as one would have to pass an indexBitwidth option to a bunch of passes.

nikic avatar Dec 23 '22 15:12 nikic