Crash / ASan issues using VmModule.wrap_buffer from Python
(Lots of red herrings here, see https://github.com/iree-org/iree/issues/17635#issuecomment-2163644331 for latest issue description)
Splitting off of https://github.com/llvm/torch-mlir/issues/3433 to discuss IREE-specific details.
I wrote some test cases for https://pytorch.org/docs/stable/generated/torch.Tensor.index_put_.html starting from PyTorch and going through iree-turbine / torch-mlir to IREE: https://gist.github.com/ScottTodd/1e95795e79d17964078217ca98a3a398. Latest results:
test_single_value | PASS
test_multiple_values | PASS (then crash)
test_broadcast_value_along_axis | FAIL
test_broadcast_value_along_indices | FAIL
test_broadcast_values_along_axis | PASS (then crash)
The two "pass then crash" test cases used to pass without crashing and I found some interesting results bisecting through IREE releases.
| release | test result |
|---|---|
| iree_compiler==20240514.893 | PASS --> CRASH |
| iree_compiler==20240508.887 | |
| iree_compiler==20240507.886 | Error invoking function ... |
| iree_compiler==20240506.885 | |
| iree_compiler==20240505.884 | |
| iree_compiler==20240504.883 | |
| iree_compiler==20240503.882 | |
| iree_compiler==20240502.881 | |
| iree_compiler==20240501.880 | Error invoking function ... |
| iree_compiler==20240430.879 | |
| iree_compiler==20240429.878 | |
| iree_compiler==20240428.877 | |
| iree_compiler==20240427.876 | Error invoking function ... |
| iree_compiler==20240426.875 | |
| iree_compiler==20240425.874 | Error invoking function ... |
| iree_compiler==20240424.873 | Error invoking function ... |
| iree_compiler==20240423.872 | PASS --> CRASH |
| iree_compiler==20240422.871 | |
| iree_compiler==20240421.870 | |
| iree_compiler==20240420.869 | PASS --> CRASH |
| iree_compiler==20240419.868 | PASS --> CRASH |
| iree_compiler==20240418.867 | PASS |
| iree_compiler==20240417.866 | PASS |
| iree_compiler==20240416.865 | |
| iree_compiler==20240415.864 | |
| iree_compiler==20240414.863 | |
| iree_compiler==20240412.861 | |
| iree_compiler==20240411.860 | |
| iree_compiler==20240410.859 | PASS |
Crash (during shutdown, maybe from writing out of bounds?):
======================================== 1 passed in 3.60s ======================================== Exception Code: 0xC0000005
#0 0x00007ffa6c89a495 PyInit__runtime (D:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\.venv\Lib\site-packages\iree\_runtime_libs\_runtime.cp311-win_amd64.pyd+0x4a495)
...
#5 0x00007ffa6c91bf1c nanobind::python_error::what(void) const (D:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\.venv\Lib\site-packages\iree\_runtime_libs\_runtime.cp311-win_amd64.pyd+0xcbf1c)
"Error invoking function" (fixed by https://github.com/iree-org/iree/pull/17339):
def _invoke(self, arg_list, ret_list):
> self._vm_context.invoke(self._vm_function, arg_list, ret_list)
E ValueError: Error invoking function: D:\a\iree\iree\c\runtime\src\iree\hal\command_buffer_validation.c:363: INVALID_ARGUMENT; source and target ranges overlap within the same buffer; while invoking native function hal.command_buffer.copy_buffer; while calling import;
E [ 2] native hal.command_buffer.copy_buffer:0 -
E [ 1] bytecode module.main$async:526 d:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\index_put_test.py:88:0
E [ 0] bytecode module.main:62 d:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\index_put_test.py:88:0
20240418.867 to 20240419.868, pass --> crash
Between 20240418.867 and 20240419.868, the test_multiple_values test case started crashing.
before: https://github.com/iree-org/iree/releases/tag/candidate-20240418.867 after: https://github.com/iree-org/iree/releases/tag/candidate-20240419.868
diff (https://github.com/iree-org/iree/compare/ff624dd8b27029f4d69f884a15ff3af8606b15b9...a2476ceef53f6d4ec0f7a9cf36892647626d324b):
λ git log --oneline candidate-20240418.867..candidate-20240419.868
a2476ceef5 [metal] Disable failing semaphore submission test until fixing (#17100)
0e1e6bfe7a Clarify fusion heuristic (#17098)
125f4200d8 [CodeGen] Add a pattern to fold extract_slice consumer into xfer.write. (#17067)
f755b42289 [Codegen] Add folding in createBoundedTileSize for partially dynamic wgSize. (#17089)
a0b4853dc7 Fixes to enable out-of-tree plugin builds. (#17095)
bd1b10626c [CodeGen] Drop encoding for HAL and Flow ops when DT is not supported. (#17081)
886c4169ac [Winograd] Use TilingInterface for all levels of winograd op tiling (#17061)
244195960f Add GPU dialect dependencies to C/Python bindings. (#17090)
ab4babee07 [ConstEval] Add flag to adjust tensor size limit for hoisting (#17064)
eec081c2c6 [LLVMGPU] Fallback if dynamic dim found on vector distribute. (#17085)
f86f21cd80 [python] Expose MLIR python bindings for gpu and transform (#17088)
4778f5f909 Categorize matmul/metvec like generic ops for dispatches (#17084)
f32a87ccd2 [Flow] Move elementwise op fusion and bubble up expand shapes patterns into their own pass. (#17068)
3677fbcce8 [runtime] Add semaphore test where 2 batches wait on a former batch amongst 2 (#17080)
20240423.872 to 20240424.873 crash --> error invoking function
before: https://github.com/iree-org/iree/releases/tag/candidate-20240423.872 after: https://github.com/iree-org/iree/releases/tag/candidate-20240424.873
diff (https://github.com/iree-org/iree/compare/f5660ee72bd8e71910d00cec726375fb4b4a6de8...59532d30c0b8b3a0cdf261eaa4e69784c9fddeee):
λ git log --oneline candidate-20240423.872..candidate-20240424.873
59532d30c0 Extend DecomposeConvolutionToLowerDimOpsPass (#17069)
1b4c76f931 Adding a hoistable attr interface to allow attaching attrs to hoists. (#17139)
729ebc642f [runtime][metal] exclude properly the failing semaphore test (#17151)
30acc53257 [runtime][cts] add test where a device batch signals another and the host (#17138)
94728e34c2 [LinalgExt][Winograd] Add winograd.filter_transform op (#17102)
4bc90e7431 Run pytorch/models tests on AMDGPU with Vulkan. (#17129)
330651efa6 [plugins][ROCM] Fix minor loc source resize bug (#17133)
f7098e3f66 Moving regression suite to azure (#17140)
290d812d96 [runtime][cts] add semaphore test where a batch waits on another and a host signal (#17130)
8736479f5d Prevent stream.async.update folding unless confirmed safe. (#17135)
7192e8c195 [Preprocessing] Add support for general ContractionOps and handle dynamic dimension using pad + expand. (#17123)
d7db353adf Add initial website documentation for ONNX frontend. (#17004)
568bb31780 [runtime][cts] Add test waiting on a semaphore for finite time and fix Vulkan driver (#17126)
655b71a282 Executable library call hooks system, and a sample Linux/CPU event implementation (#15803)
3dde925d11 [VectorDistribution] Add distribution pattern for vector::MultiDimReductionOp (#17076)
Oooh, the test crash is flaky. That makes bisecting trickier... would be easier on Linux with ASan.
I tried around 10 times at iree_compiler==20240418.867 and can't repro there.
Still verifying with ASan on Linux to avoid the flakes, but my bisect on Windows seems to be pointing to https://github.com/iree-org/iree/commit/f32a87ccd2 for the test crash. That's "mostly" an NFC according to @MaheshRavishankar 🤔
Bah, having trouble pinning this down.
- Linux segfaults in my Python unit test on package versions as old as
20240410.859(and possibly earlier). - Linux source builds with ASan at tip of tree report no issues with the test case when run through the
iree-compileandiree-run-moduleCLI tools. ~~Trying to also run through the Python bindings but hittingImportError: cannot import name 'ir' from 'iree.compiler._mlir_libs._mlir'and other venv setup issues.~~ (had built in one venv and then tried to use from another venv) - On Windows I diffed the
--mlir-print-ir-after-allbefore and after https://github.com/iree-org/iree/commit/f32a87ccd2 and there aren't many changes: codegen IR is identical beforeSerializeTargetExecutablesPassand a few passes moved around but that's about it.
:/ yeah, .vmfb files produced before and after https://github.com/iree-org/iree/commit/f32a87ccd2 are identical and that commit didn't modify runtime code or python bindings. That's what my bisect pointed to though.
Going to keep trying with ASan on Linux to see if I can get a 100% repro case instead of the flaky crashes on Windows.
Following https://iree.dev/developers/debugging/sanitizers/#asan-addresssanitizer on Linux got me a source Python build with ASan on Linux that reports errors in the test case I've been working with:
(.venv) scotttodd@scotttodd-cpu:~/scratch/tests$ LD_PRELOAD=/usr/lib/llvm-14/lib/clang/14.0.0/lib/linux/libclang_rt.asan-x86_64.so pytest --log-cli-level=info index_put_test.py::TestIndexPut::test_multiple_values
================================================================================================== test session starts ===================================================================================================
platform linux -- Python 3.11.0rc1, pytest-8.2.2, pluggy-1.5.0
rootdir: /home/scotttodd/scratch/tests
collected 1 item
index_put_test.py::TestIndexPut::test_multiple_values
----------------------------------------------------------------------------------------------------- live log call ------------------------------------------------------------------------------------------------------
INFO index_put_test:index_put_test.py:110 module @module {
func.func @main(%arg0: !torch.tensor<[3,6],f32>) -> !torch.vtensor<[3,6],f32> attributes {torch.assume_strict_symbolic_shapes} {
%0 = torch.vtensor.literal(dense_resource<torch_tensor_3_torch.int64> : tensor<3xsi64>) : !torch.vtensor<[3],si64>
%1 = torch.vtensor.literal(dense_resource<torch_tensor_3_torch.int64_1> : tensor<3xsi64>) : !torch.vtensor<[3],si64>
%2 = torch.vtensor.literal(dense_resource<torch_tensor_3_torch.float32> : tensor<3xf32>) : !torch.vtensor<[3],f32>
%3 = torch.copy.to_vtensor %arg0 : !torch.vtensor<[3,6],f32>
%none = torch.constant.none
%4 = torch.aten.clone %0, %none : !torch.vtensor<[3],si64>, !torch.none -> !torch.vtensor<[3],si64>
%none_0 = torch.constant.none
%5 = torch.aten.clone %1, %none_0 : !torch.vtensor<[3],si64>, !torch.none -> !torch.vtensor<[3],si64>
%none_1 = torch.constant.none
%6 = torch.aten.clone %2, %none_1 : !torch.vtensor<[3],f32>, !torch.none -> !torch.vtensor<[3],f32>
%7 = torch.prim.ListConstruct %4, %5 : (!torch.vtensor<[3],si64>, !torch.vtensor<[3],si64>) -> !torch.list<optional<vtensor>>
%false = torch.constant.bool false
%8 = torch.aten.index_put %3, %7, %6, %false : !torch.vtensor<[3,6],f32>, !torch.list<optional<vtensor>>, !torch.vtensor<[3],f32>, !torch.bool -> !torch.vtensor<[3,6],f32>
torch.overwrite.tensor.contents %8 overwrites %arg0 : !torch.vtensor<[3,6],f32>, !torch.tensor<[3,6],f32>
return %8 : !torch.vtensor<[3,6],f32>
}
}
{-#
dialect_resources: {
builtin: {
torch_tensor_3_torch.int64: "0x08000000000000000000000001000000000000000200000000000000",
torch_tensor_3_torch.int64_1: "0x08000000030000000000000004000000000000000500000000000000",
torch_tensor_3_torch.float32: "0x04000000CDCCCC3DCDCC4C3E9A99993E"
}
}
#-}
INFO index_put_test:index_put_test.py:128 [[0. 0. 0. 0.1 0. 0. ]
[0. 0. 0. 0. 0.2 0. ]
[0. 0. 0. 0. 0. 0.3]]
PASSED [100%]
=================================================================================================== 1 passed in 5.95s ====================================================================================================
AddressSanitizer:DEADLYSIGNAL
=================================================================
==78453==ERROR: AddressSanitizer: SEGV on unknown address 0x7f04a3c4d074 (pc 0x7f04a12ad2d8 bp 0x000000000000 sp 0x7ffe78c08730 T0)
==78453==The signal is caused by a READ memory access.
#0 0x7f04a12ad2d8 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241)
#1 0x7f04a1321d78 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122d78) (BuildId: 32e87a22f20d0241)
#2 0x7f04a1321b86 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122b86) (BuildId: 32e87a22f20d0241)
#3 0x7f04a125482d (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x5582d) (BuildId: 32e87a22f20d0241)
#4 0x5af471 (/usr/bin/python3.11+0x5af471) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#5 0x4d8278 (/usr/bin/python3.11+0x4d8278) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#6 0x62a28d (/usr/bin/python3.11+0x62a28d) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#7 0x654986 in PyGC_Collect (/usr/bin/python3.11+0x654986) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#8 0x646d40 in Py_FinalizeEx (/usr/bin/python3.11+0x646d40) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#9 0x668cfb in Py_Exit (/usr/bin/python3.11+0x668cfb) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#10 0x6555de (/usr/bin/python3.11+0x6555de) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#11 0x6555a5 in PyErr_PrintEx (/usr/bin/python3.11+0x6555a5) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#12 0x650a91 in _PyRun_SimpleFileObject (/usr/bin/python3.11+0x650a91) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#13 0x650832 in _PyRun_AnyFileObject (/usr/bin/python3.11+0x650832) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#14 0x64f786 in Py_RunMain (/usr/bin/python3.11+0x64f786) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#15 0x61ee0c in Py_BytesMain (/usr/bin/python3.11+0x61ee0c) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#16 0x7f04a4429d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#17 0x7f04a4429e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#18 0x61ec94 in _start (/usr/bin/python3.11+0x61ec94) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241)
==78453==ABORTING
That's at tip of tree. Going to sync back to the suspected commit ranges to see if ASan similarly complains there.
Thanks @ScottTodd . If it does get back to that commit then let me know.
Having a hard time running my python test from an older commit.
-
Had to downgrade nanobind to build the IREE Python bindings from 2 months ago
-
Seeing this error when used together with the latest (pip installable)
iree-turbinepackage:Traceback: /usr/lib/python3.11/importlib/__init__.py:126: in import_module return _bootstrap._gcd_import(name[level:], package, level) index_put_test.py:12: in <module> import shark_turbine.aot as aot .venv/lib/python3.11/site-packages/shark_turbine/aot/__init__.py:7: in <module> from .builtins import * .venv/lib/python3.11/site-packages/shark_turbine/aot/builtins/__init__.py:7: in <module> from .globals import * .venv/lib/python3.11/site-packages/shark_turbine/aot/builtins/globals.py:12: in <module> from ..support.procedural import ( .venv/lib/python3.11/site-packages/shark_turbine/aot/support/procedural/__init__.py:13: in <module> from .base import * .venv/lib/python3.11/site-packages/shark_turbine/aot/support/procedural/base.py:22: in <module> from ....support.ir_imports import ( .venv/lib/python3.11/site-packages/shark_turbine/support/ir_imports.py:10: in <module> from iree.compiler.ir import ( ../../iree-build/compiler/bindings/python/iree/compiler/ir.py:5: in <module> from ._mlir_libs._mlir.ir import * ../../iree-build/compiler/bindings/python/iree/compiler/_mlir_libs/__init__.py:180: in <module> _site_initialize() ../../iree-build/compiler/bindings/python/iree/compiler/_mlir_libs/__init__.py:78: in _site_initialize from ._mlir import ir E ImportError: /home/scotttodd/iree-build/compiler/bindings/python/iree/compiler/_mlir_libs/_mlir.cpython-311-x86_64-linux-gnu.so: undefined symbol: mlirSetGlobalDebugTypes, version IREE_API_0.0 -
Seeing alignment issues after switching from
iree-turbineto justfrom iree.compiler import compile_fileon an already exported .mlir file:compiled_module = compile_file("/home/scotttodd/scratch/tests/index_put_multiple_values.mlir", target_backends=["llvm-cpu"]) config = ireert.Config("local-sync") vm_module = ireert.load_vm_module( > ireert.VmModule.wrap_buffer( # config.vm_instance, compiled_module.map_memory() config.vm_instance, compiled_module ), config, ) E ValueError: VmModule.from_aligned_memory received an unaligned buffer. Got 0x0x625001860120, expected alignment 64
Argh, red herrings all around.
Running a trivial test case instead of index_put
class TorchModule(torch.nn.Module):
def forward(self, input):
return input + torch.ones(3, 4)
through Python in this setup with ASan on tip of tree also produces
==246621==ERROR: AddressSanitizer: SEGV on unknown address 0x7f96b787e050 (pc 0x7f96b43422d8 bp 0x000000000000 sp 0x7ffdeacb5240 T0)
==246621==The signal is caused by a READ memory access.
the problematic lines for the ASan error are
vm_module = ireert.load_vm_module(
ireert.VmModule.wrap_buffer(
config.vm_instance, compiled_module.map_memory()
),
config,
)
Soooo... why does my TestIndexPut::test_single_value test case succeed on Windows while the TestIndexPut::test_multiple_values test case succeeds then crashes? Maybe the test cases write to different elements, near the edges of the buffers:
# Passing
[ # col 0 col 1 col 2 col 3
[0.0000, 0.0000, 0.0000, 0.0000], # row 0
[0.0000, 0.0000, 0.5000, 0.0000], # row 1
[0.0000, 0.0000, 0.0000, 0.0000], # row 2
]
# Failing
[ # col 0 col 1 col 2 col 3 col 4 col 5
[0.0000, 0.0000, 0.0000, 0.1000, 0.0000, 0.0000], # row 0
[0.0000, 0.0000, 0.0000, 0.0000, 0.2000, 0.0000], # row 1
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3000], # row 2
]
Then also why is this behavior so difficult to pin down? Is ASan missing something? Is Python a relevant detail? The index put operations are trying to do in-place processing that could be hitting a special path in Python vs through iree-run-module. Could try building a .c test file using the C API.
Maybe the test cases write to different elements, near the edges of the buffers:
Yep! This put of a single value also passes:
# indices=[torch.tensor([0]), torch.tensor([3])],
[ # col 0 col 1 col 2 col 3
[0.0000, 0.0000, 0.0000, 0.5000], # row 0
[0.0000, 0.0000, 0.0000, 0.0000], # row 1
[0.0000, 0.0000, 0.0000, 0.0000], # row 2
]
but these (writing a single value into the last row) crash:
# indices=[torch.tensor([2]), torch.tensor([3])],
[ # col 0 col 1 col 2 col 3
[0.0000, 0.0000, 0.0000, 0.0000], # row 0
[0.0000, 0.0000, 0.0000, 0.0000], # row 1
[0.0000, 0.0000, 0.0000, 0.5000], # row 2
]
# indices=[torch.tensor([2]), torch.tensor([0])],
[ # col 0 col 1 col 2 col 3
[0.0000, 0.0000, 0.0000, 0.0000], # row 0
[0.0000, 0.0000, 0.0000, 0.0000], # row 1
[0.5000, 0.0000, 0.0000, 0.0000], # row 2
]
Writing into the last row of a [3, 4] tensor crashes. Writing into the last row of a [4, 4] or [6, 6] tensor does not. The full model from https://github.com/nod-ai/sharktank/issues/22 was using [?,16,32,100]. I'm wondering if the dynamic dim was odd?
After restarting my machine, I'm only seeing the Python test crash in about 1/30 runs. That's going to make it hard to verify which test cases are definitely working and which aren't. Results were very consistent yesterday...
Bah, just saw a crash writing into the middle of a 4x4 tensor. The crash I'm seeing might be unique to the Python bindings and maybe not even unique to index_put_. Logs: https://gist.github.com/ScottTodd/a0c0e68d1abeb3240f782045c4c70e80
Aw, saw the same Python crash performing an elementwise add on a 4x4 tensor...
Minimal repro for the Python AddressSanitizer report (note: this shows up when using iree-turbine with example code like https://github.com/iree-org/iree-turbine/blob/4b451f84b03f87af21a9b785b0ddd68094f43ed8/examples/aot_mlp/mlp_export_simple.py#L45-L49)
Seems to be related to VmModule.wrap_buffer() and map_memory() on a compiler output. Using VmModule.copy_buffer() avoids the ASan issue.
import iree.runtime as ireert
from iree.compiler.api import (
Session,
Source,
Output,
)
session = Session()
session.set_flags("--iree-hal-target-backends=vmvx")
inv = session.invocation()
source = Source.wrap_buffer(
session,
b"""
builtin.module {
func.func @abs(%input : tensor<4xf32>) -> (tensor<4xf32>) {
%result = math.absf %input : tensor<4xf32>
return %result : tensor<4xf32>
}
}""",
)
inv.parse_source(source)
inv.execute()
out = Output.open_membuffer()
inv.output_vm_bytecode(out)
config = ireert.Config("local-sync")
# ASan issue goes away if this is commented out
vm_module = ireert.load_vm_module(
ireert.VmModule.wrap_buffer(config.vm_instance, out.map_memory()),
config,
)
(.venv) scotttodd@scotttodd-cpu:~/scratch/tests$ LD_PRELOAD=/usr/lib/llvm-14/lib/clang/14.0.0/lib/linux/libclang_rt.asan-x86_64.so ASAN_OPTIONS=detect_leaks=0 python asan_minimal_repro.py
AddressSanitizer:DEADLYSIGNAL
=================================================================
==39396==ERROR: AddressSanitizer: SEGV on unknown address 0x7fbf57968050 (pc 0x7fbffd1712d8 bp 0x000000000000 sp 0x7fffe15017f0 T0)
==39396==The signal is caused by a READ memory access.
#0 0x7fbffd1712d8 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241)
#1 0x7fbffd1e5d78 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122d78) (BuildId: 32e87a22f20d0241)
#2 0x7fbffd1e5b86 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122b86) (BuildId: 32e87a22f20d0241)
#3 0x7fbffd11882d (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x5582d) (BuildId: 32e87a22f20d0241)
#4 0x5af471 (/usr/bin/python3.11+0x5af471) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#5 0x4d8278 (/usr/bin/python3.11+0x4d8278) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#6 0x6557cf (/usr/bin/python3.11+0x6557cf) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#7 0x654f99 (/usr/bin/python3.11+0x654f99) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#8 0x646d4d in Py_FinalizeEx (/usr/bin/python3.11+0x646d4d) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#9 0x64f5ea in Py_RunMain (/usr/bin/python3.11+0x64f5ea) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#10 0x61ee0c in Py_BytesMain (/usr/bin/python3.11+0x61ee0c) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#11 0x7fbfff629d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#12 0x7fbfff629e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#13 0x61ec94 in _start (/usr/bin/python3.11+0x61ec94) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241)
==39396==ABORTING
interesting - the compiler Source seems to hang on to the buffer provided in wrap_buffer, but you could try adding a print to its close() - the buffer as provided to ireeCompilerSourceWrapBuffer is unowned (it's just a char*) so if the python side doesn't keep it live it'll go boom
iow, that Source instance must remain live until the iree_vm_module_t created from it is deleted - you could add a printf in there to see if they happen out of order
Try with copy_buffer instead of wrap_buffer just to eliminate variables. Removes any potential alignment or ownership issues.
Ok, so that is a definite bug with wrap_buffer used in that way. Will need to fortify testing of that and fix.
But probably not what you are trying to find...
Try with copy_buffer instead of wrap_buffer just to eliminate variables. Removes any potential alignment or ownership issues.
My tests pass a clean ASan report when I use copy_buffer instead of wrap_buffer. That might be enough to call the lowering/codegen/runtime for index_put_ correct (except for the unhandled broadcast case in the torch-mlir lowering, tracked here: https://github.com/llvm/torch-mlir/issues/3433).
you could add a printf in there to see if they happen out of order
Trying this now. Printfs seem to be working from compiler Python code but I'm not seeing my changes to the runtime reflected in my venv... strange (my PYTHONPATH and build setup both seem fine...).
Sorta figured out my python bindings debug setup:
- Building with CMake and setting
PYTHONPATHwas not enough to get updated C++ code in the runtime, at least when I already hadiree-runtimeinstalled from pip. Maybe uninstalling from pip would help. What I ended up doing waspython -m pip wheel runtime/ -vto build a wheel thenpip install iree_runtime-0.dev0+[...].whl
I found that if I change BoundModule in system_api.py to no longer retain the SystemContext: https://github.com/iree-org/iree/blob/0a561c47c3dc77ce038aef0ca9bf8d8f02ff2f2a/runtime/bindings/python/iree/runtime/system_api.py#L123-L130
then the ASan error goes away. Something is trying to access memory that it shouldn't be, just not sure what specifically.
Printf debugging:
test.py calling Source.wrap_buffer(session, ...)
--- CompilerDriver.cpp: ireeCompilerSourceWrapBuffer, length: 167, isNullTerminated: 0 ---
test.py calling inv.parse_source(source)
test.py calling inv.execute()
test.py calling out = Output.open_membuffer()
--- CompilerDriver.cpp: ireeCompilerOutputOpenMembuffer ---
test.py calling inv.output_vm_bytecode(out)
--- CompilerDriver.cpp: ireeCompilerInvocationOutputVMBytecode ---
test.py calling ireert.VmModule.wrap_buffer
--- CompilerDriver.cpp: ireeCompilerOutputMapMemory ---
--- vm.cc: VmModule::WrapBuffer, (edit) close_buffer: 0 ---
test.py calling ireert.load_vm_module
system_api.py load_vm_module --> load_vm_modules
system_api.py load_vm_modules
system_api.py: SystemContext::__init__
system_api.py: SystemContext::__init__ is *not* dynamic
system_api.py: SystemContext::__init__ setup self._bound_modules
test.py finished
--- CompilerDriver.cpp: ireeCompilerSourceDestroy start ---
--- CompilerDriver.cpp: ireeCompilerSourceDestroy finish ---
AddressSanitizer:DEADLYSIGNAL
=================================================================
==189798==ERROR: AddressSanitizer: SEGV on unknown address
Let's fork this repro into an issue for wrap_buffer from the compiler API. I've seen evidence of something like this before but have had trouble getting it to repro.
Sure. I'm not sure if there's actually an issue with index_put_ at all... possibly all red herrings in my testing that were caused by wrap_buffer. Can still start a more focused issue.
Frustrating.
This test looks quite similar to what I'm testing: https://github.com/iree-org/iree/blob/main/compiler/bindings/python/test/api/output_buffer_reference_test.py, and that is ASan-clean.
I'll see where the code diverges and try to file a more focused issue once I know more.
If it turns out this is the issue, we can just rename this issue and not have another one. Was this the only thing wrong the whole time?
This test looks quite similar to what I'm testing: https://github.com/iree-org/iree/blob/main/compiler/bindings/python/test/api/output_buffer_reference_test.py, and that is ASan-clean.
I'll see where the code diverges and try to file a more focused issue once I know more.
I mean, I've "fixed" this bug a couple of times. Something subtle is wrong and I think it needs a closer look at actual reference counts or something.
- output_buffer_reference_test stops at
module = VmModule.wrap_buffer(instance, mapped_memory) context = VmContext(instance, modules=[module]) - my test stops at
vm_module = ireert.load_vm_module( ireert.VmModule.wrap_buffer(config.vm_instance, out.map_memory()), config, )
The setup appears to all be the same.
Was this the only thing wrong the whole time?
Unclear. We originally hit crashes in decode() for the sharktank llama model in Python and native tools that we were trying to reduce. Some forms of reducing the test hit unsupported lowerings (broadcasting behavior). I think there's still a different crash involved somewhere since this latest line of debugging involves only the Python bindings.
- Building with CMake and setting
PYTHONPATHwas not enough to get updated C++ code in the runtime, at least when I already hadiree-runtimeinstalled from pip. Maybe uninstalling from pip would help. What I ended up doing waspython -m pip wheel runtime/ -vto build a wheel thenpip install iree_runtime-0.dev0+[...].whl
Aha! Uninstalling the python runtime wheel and just setting PYTHONPATH to the build worked, and now I get stack traces from ASan!
scotttodd@scotttodd-cpu:~/scratch/tests$ LD_PRELOAD=/usr/lib/llvm-14/lib/clang/14.0.0/lib/linux/libclang_rt.asan-x86_64.so ASAN_OPTIONS=detect_leaks=0 ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-14/bin/llvm-symbolizer python asan_repro_debugging.py
test.py calling Source.wrap_buffer(session, ...)
CompilerDriver.cpp: ireeCompilerSourceWrapBuffer, length: 167, isNullTerminated: 0 ---
test.py calling inv.parse_source(source)
test.py calling inv.execute()
test.py calling out = Output.open_membuffer()
CompilerDriver.cpp: ireeCompilerOutputOpenMembuffer
test.py calling inv.output_vm_bytecode(out)
CompilerDriver.cpp: ireeCompilerInvocationOutputVMBytecode
test.py calling out.map_memory()
ctypes_dl.py :: Output::map_memory
CompilerDriver.cpp: ireeCompilerOutputMapMemory
test.py calling ireert.VmModule.wrap_buffer()
vm.cc: VmModule::WrapBuffer, close_buffer: 0
vm.cc: VmModule::WrapBuffer, iree_vm_bytecode_module_create
VMFB Length = 5558
test.py calling ireert.load_vm_module
system_api.py load_vm_module --> load_vm_modules
system_api.py load_vm_modules
system_api.py: SystemContext::__init__
system_api.py: SystemContext::__init__ is *not* dynamic
system_api.py: SystemContext::__init__ setup self._bound_modules
test.py finished
CompilerDriver.cpp: ireeCompilerSourceDestroy start
CompilerDriver.cpp: ireeCompilerSourceDestroy finish
AddressSanitizer:DEADLYSIGNAL
=================================================================
==229852==ERROR: AddressSanitizer: SEGV on unknown address 0x7f66510ff050 (pc 0x7f66efa5f25e bp 0x7fff9db6e9d0 sp 0x7fff9db6e950 T0)
==229852==The signal is caused by a READ memory access.
+ #0 0x7f66efa5f25e in __flatbuffers_soffset_read /home/scotttodd/iree/third_party/flatcc/include/flatcc/flatcc_endian.h:89:2
+ #1 0x7f66efa5f25e in __flatbuffers_soffset_read_from_pe /home/scotttodd/iree/third_party/flatcc/include/flatcc/flatcc_endian.h:89:2
+ #2 0x7f66efa5f25e in iree_vm_BytecodeModuleDef_exported_functions /home/scotttodd/iree-build/runtime/src/iree/schemas/bytecode_module_def_reader.h:693:1
+ #3 0x7f66efa5f25e in iree_vm_bytecode_module_lookup_function /home/scotttodd/iree/runtime/src/iree/vm/bytecode/module.c:292:9
+ #4 0x7f66efb5b497 in iree_vm_context_run_function /home/scotttodd/iree/runtime/src/iree/vm/context.c:77:26
+ #5 0x7f66efb5b497 in iree_vm_context_release_modules /home/scotttodd/iree/runtime/src/iree/vm/context.c:269:5
+ #6 0x7f66efb5acba in iree_vm_context_destroy /home/scotttodd/iree/runtime/src/iree/vm/context.c:357:5
+ #7 0x7f66ef9c0cbe in iree::python::ApiPtrAdapter<iree_vm_context_t>::Release(iree_vm_context_t*) /home/scotttodd/iree/runtime/bindings/python/./vm.h:42:47
+ #8 0x7f66ef9c0cbe in iree::python::ApiRefCounted<iree::python::VmContext, iree_vm_context_t>::Release() /home/scotttodd/iree/runtime/bindings/python/./binding.h:107:7
+ #9 0x7f66ef9c0cbe in iree::python::ApiRefCounted<iree::python::VmContext, iree_vm_context_t>::~ApiRefCounted() /home/scotttodd/iree/runtime/bindings/python/./binding.h:59:22
+ #10 0x7f66ef9e40f5 in nanobind::detail::inst_dealloc(_object*) /home/scotttodd/iree/.venv/lib/python3.11/site-packages/nanobind/src/nb_type.cpp:229:13
#11 0x5af471 (/usr/bin/python3.11+0x5af471) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#12 0x4d8278 (/usr/bin/python3.11+0x4d8278) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#13 0x6557cf (/usr/bin/python3.11+0x6557cf) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#14 0x654f99 (/usr/bin/python3.11+0x654f99) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#15 0x646d4d in Py_FinalizeEx (/usr/bin/python3.11+0x646d4d) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#16 0x64f5ea in Py_RunMain (/usr/bin/python3.11+0x64f5ea) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#17 0x61ee0c in Py_BytesMain (/usr/bin/python3.11+0x61ee0c) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#18 0x7f66f2229d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#19 0x7f66f2229e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#20 0x61ec94 in _start (/usr/bin/python3.11+0x61ec94) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/scotttodd/iree/third_party/flatcc/include/flatcc/flatcc_endian.h:89:2 in __flatbuffers_soffset_read
==229852==ABORTING
So the VM is trying to run the __deinit function on the module, but the module was already freed...?
https://github.com/iree-org/iree/blob/71c07faba52f527b08abc73c652ba140f1d8aa54/runtime/src/iree/vm/context.c#L269-L270
Yeah, ASan is happy if I clear the loaded module before letting the program exit on its own:
loaded_module = ireert.load_vm_module(
wrapped_buffer,
config,
)
+loaded_module = None
This passes ASan, finishing with no leaks or segfaults:
instance = VmInstance()
output = Output.open_membuffer()
output.write(vmfb_contents)
mapped_memory = output.map_memory()
module = VmModule.wrap_buffer(instance, mapped_memory)
context = VmContext(instance, modules=[module])
This crashes and trips ASan with a segfault on a read memory access:
instance = VmInstance()
output = Output.open_membuffer()
output.write(vmfb_contents)
mapped_memory = output.map_memory()
module = VmModule.wrap_buffer(instance, mapped_memory)
# note this line is different!
loaded_module = load_vm_module(module)
The source for that different line is here: https://github.com/iree-org/iree/blob/71c07faba52f527b08abc73c652ba140f1d8aa54/runtime/bindings/python/iree/runtime/system_api.py#L247-L251
the SystemContext class creates BoundModule classes that retain a reference to the self SystemContext: https://github.com/iree-org/iree/blob/71c07faba52f527b08abc73c652ba140f1d8aa54/runtime/bindings/python/iree/runtime/system_api.py#L204-L206
Is that reference cycle throwing off the usual garbage collector / shutdown / destruction ordering?