[Python] a segfault on `import pyarrow` on MacOS 11.6 after pip update to pyarrow version 16.1.0
Describe the bug, including details regarding any error messages, version, and platform.
Hi,
I running into an issue with a newly released 16.1.0 version. The 16.0.0 works without issues, but updating to 16.1.0 causes a segfault. Any help is greatly appreciated. Thanks!
- Python 3.11.5
- Clang 14.0.6
- MacOS 11.6 (x86_64)
% conda list
# packages in environment at /Users/yana/anaconda3/envs/pyarrow-test:
#
# Name Version Build Channel
blas 1.0 mkl
bottleneck 1.3.7 py311hb3a5e46_0
bzip2 1.0.8 h6c40b1e_6
ca-certificates 2024.3.11 hecd8cb5_0
intel-openmp 2023.1.0 ha357a0b_43548
libcxx 14.0.6 h9765a3e_0
libffi 3.4.4 hecd8cb5_1
mkl 2023.1.0 h8e150cf_43560
mkl-service 2.4.0 py311h6c40b1e_1
mkl_fft 1.3.8 py311h6c40b1e_0
mkl_random 1.2.4 py311ha357a0b_0
ncurses 6.4 hcec6c5f_0
numexpr 2.8.7 py311h728a8a3_0
numpy 1.26.4 py311h728a8a3_0
numpy-base 1.26.4 py311h53bf9ac_0
openssl 3.0.13 hca72f7f_1
pandas 2.2.1 py311hdb55bb0_0
pip 24.0 py311hecd8cb5_0
pyarrow 16.1.0 pypi_0 pypi
python 3.11.5 hf27a42d_0
python-dateutil 2.9.0post0 py311hecd8cb5_0
python-tzdata 2023.3 pyhd3eb1b0_0
pytz 2024.1 py311hecd8cb5_0
readline 8.2 hca72f7f_0
setuptools 69.5.1 py311hecd8cb5_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.45.3 h6c40b1e_0
tbb 2021.8.0 ha357a0b_0
tk 8.6.14 h4d00af3_0
tzdata 2024a h04d1e81_0
wheel 0.43.0 py311hecd8cb5_0
xz 5.4.6 h6c40b1e_1
zlib 1.2.13 h4b97444_1
% python
Python 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
zsh: segmentation fault python
The test_pyarrow.py has only one line: import pyarrow:
% lldb python test_pyarrow.py
(lldb) target create "python"
Current executable set to '/Users/yana/anaconda3/bin/python' (x86_64).
(lldb) settings set -- target.run-args "test_pyarrow.py"
(lldb) run
Process 35199 launched: '/Users/yana/anaconda3/bin/python' (x86_64)
Process 35199 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x0000000129c8685e libarrow.1601.dylib`malloc_conf_init_helper + 302
libarrow.1601.dylib`malloc_conf_init_helper:
-> 0x129c8685e <+302>: movzbl (%rbx), %eax
0x129c86861 <+305>: testb %al, %al
0x129c86863 <+307>: je 0x129c8936e ; <+11326>
0x129c86869 <+313>: movq %rbx, %rcx
(lldb) exit
Component(s)
Python, Release
@ianna thanks for the report!
The crash seems to be related to jemalloc. Not a solution, but just curious: could you set ARROW_DEFAULT_MEMORY_POOL=system env variable before importing pyarrow?
@ianna thanks for the report!
The crash seems to be related to jemalloc. Not a solution, but just curious: could you set
ARROW_DEFAULT_MEMORY_POOL=systemenv variable before importing pyarrow?
@jorisvandenbossche - thanks for prompt reply! Setting the env variable does not help :-(
% python
Python 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print(os.environ["ARROW_DEFAULT_MEMORY_POOL"])
system
>>> import pyarrow
zsh: segmentation fault python
I can reproduce on MacOS x86_64 with pip install pyarrow and import pyarrow. The lldb backtrace I get is:
(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
* frame #0: 0x00000001077a685e libarrow.1601.dylib`malloc_conf_init_helper + 302
frame #1: 0x00000001077a6207 libarrow.1601.dylib`malloc_init_hard_a0_locked + 135
frame #2: 0x00000001077a9c1f libarrow.1601.dylib`malloc_init_hard + 159
frame #3: 0x00000001006bbb47 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 535
frame #4: 0x00000001006bbf52 dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40
frame #5: 0x00000001006b6ae6 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 492
frame #6: 0x00000001006b6a51 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 343
frame #7: 0x00000001006b6a51 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 343
frame #8: 0x00000001006b6a51 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 343
frame #9: 0x00000001006b6a51 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 343
frame #10: 0x00000001006b489f dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 191
frame #11: 0x00000001006b4940 dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 82
frame #12: 0x00000001006a4a12 dyld`dyld::runInitializers(ImageLoader*) + 82
frame #13: 0x00000001006b011a dyld`dlopen_internal + 616
frame #14: 0x00007fff20406c94 libdyld.dylib`dlopen_internal(char const*, int, void*) + 185
frame #15: 0x00007fff203f507e libdyld.dylib`dlopen + 28
frame #16: 0x0000000100244ff5 python`_PyImport_LoadDynamicModuleWithSpec + 693
frame #17: 0x0000000100244337 python`_imp_create_dynamic + 167
frame #18: 0x00000001000e84fa python`cfunction_vectorcall_FASTCALL + 106
frame #19: 0x00000001001f6175 python`_PyEval_EvalFrameDefault + 284165
frame #20: 0x000000010007bec9 python`_PyObject_VectorcallTstate.779 + 73
frame #21: 0x0000000100080c1e python`object_vacall + 414
frame #22: 0x0000000100080a2a python`PyObject_CallMethodObjArgs + 234
frame #23: 0x0000000100242262 python`PyImport_ImportModuleLevelObject + 3490
frame #24: 0x00000001001e3b4a python`_PyEval_EvalFrameDefault + 208858
frame #25: 0x00000001001ae2bd python`PyEval_EvalCode + 253
frame #26: 0x00000001001a9a55 python`builtin_exec + 437
frame #27: 0x00000001000e8462 python`cfunction_vectorcall_FASTCALL_KEYWORDS + 98
frame #28: 0x00000001001f6175 python`_PyEval_EvalFrameDefault + 284165
frame #29: 0x000000010007bec9 python`_PyObject_VectorcallTstate.779 + 73
frame #30: 0x0000000100080c1e python`object_vacall + 414
frame #31: 0x0000000100080a2a python`PyObject_CallMethodObjArgs + 234
frame #32: 0x0000000100242262 python`PyImport_ImportModuleLevelObject + 3490
frame #33: 0x00000001001e3b4a python`_PyEval_EvalFrameDefault + 208858
frame #34: 0x00000001001ae2bd python`PyEval_EvalCode + 253
frame #35: 0x0000000100277893 python`run_mod + 275
frame #36: 0x0000000100277c80 python`PyRun_InteractiveOneObjectEx + 592
frame #37: 0x0000000100276e1d python`_PyRun_InteractiveLoopObject + 141
frame #38: 0x00000001002769bf python`_PyRun_AnyFileObject + 63
frame #39: 0x0000000100279cca python`PyRun_AnyFileExFlags + 58
frame #40: 0x00000001002a333f python`pymain_run_stdin + 175
frame #41: 0x00000001002a294d python`Py_RunMain + 637
frame #42: 0x0000000100001a68 python`main + 56
frame #43: 0x00007fff20404f3d libdyld.dylib`start + 1
thread #2
frame #0: 0x00007fff203b593e libsystem_kernel.dylib`__workq_kernreturn + 10
frame #1: 0x00007fff203e64c1 libsystem_pthread.dylib`_pthread_wqthread + 414
frame #2: 0x00007fff203e542f libsystem_pthread.dylib`start_wqthread + 15
thread #3
frame #0: 0x00007fff203b593e libsystem_kernel.dylib`__workq_kernreturn + 10
frame #1: 0x00007fff203e64c1 libsystem_pthread.dylib`_pthread_wqthread + 414
frame #2: 0x00007fff203e542f libsystem_pthread.dylib`start_wqthread + 15
(lldb)
I have the same EXC_BAD_ACCESS issue and lldb backtrace on MacOS 11.7.4 (x86_64) with python 3.11.0.
pip install pyarrow==16.0.0 does not have the issue.
I did a quick google of the stack trace, and there's a note in jemalloc specifically about the function that crashes (may or may not be related):
https://github.com/jemalloc/jemalloc/blob/5afff2e44e8d31ef1e9eb01d6b1327fe111835ed/src/jemalloc.c#L229-L245
Reproduced:
* If another constructor in the same binary is using mallctl to e.g.
* set up extent hooks, it may end up running before this one, and
* malloc_init_hard will crash trying to lock the uninitialized lock. So
* we force an initialization of the lock in malloc_init_hard as well.
Some links to possible places referred to by the stack trace:
malloc_conf_init_helper(): https://github.com/jemalloc/jemalloc/blob/5afff2e44e8d31ef1e9eb01d6b1327fe111835ed/src/jemalloc.c#L1084- Call to
malloc_conf_init_helper()inmalloc_conf_ini(): https://github.com/jemalloc/jemalloc/blob/5afff2e44e8d31ef1e9eb01d6b1327fe111835ed/src/jemalloc.c#L1795 - Call to
malloc_conf_init()inmalloc_init_hard_a0_locked(): https://github.com/jemalloc/jemalloc/blob/5afff2e44e8d31ef1e9eb01d6b1327fe111835ed/src/jemalloc.c#L1863
The diff between 16.0.0 and 16.1.0 is here: https://github.com/apache/arrow/compare/apache-arrow-16.0.0...apache-arrow-16.1.0.
The only change that seems somewhat related is https://github.com/apache/arrow/pull/41567 (unless the fixes for scalar scratch space fix for some reason caused this), although this is also strange because in the past this macos-latest should also have defaulted to macos-13 at some point, I assume?
cc @raulcd
I can reproduce on MacOS x86_64
@paleolimbot with which version of macOS could you reproduce this? The OP reported macOS 11, and for example also duckdb observed segfaults on import specifically for macOS 11 with x86_64 (https://github.com/duckdb/duckdb/issues/12199#issuecomment-2126992958). They decided to bump the their released wheels to target macOS 12+.
I tested this on MacOS 11!
@paleolimbot one more question: could you test with the latest nightly wheel (I assume nothing changed or fixed it, but just to be sure, because if it is still failing we should mark this as a blocker for the release I think)
pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ \
--prefer-binary --pre pyarrow
From seeing similar segfaults in other projects with a complex C++ dependency (https://github.com/duckdb/duckdb/issues/12199#issuecomment-2126992958, https://github.com/geopandas/pyogrio/pull/417#issuecomment-2155856289 building with GDAL), my general assumption is that this is something on the macOS side, and with macOS 11 no longer being supported, probably the only thing we can do is bump the deployment target for our wheels to macOS 12.
That will mean that someone trying to install pyarrow on macOS 11 (as done in this issue) will no longer have a wheel available and then pip will try to install from source. That will typically fail at installation time, but at least not segfault at runtime ..
could you test with the latest nightly wheel
I just tested this again (MacOS 11.7.10, Python 3.11) and it resulted in a segfault as before!
Should we bump the deployment target to macOS 12 before the 17.0.0 release? @jorisvandenbossche @pitrou
Should we bump the deployment target to macOS 12 before the 17.0.0 release? @jorisvandenbossche @pitrou
FYI, The message from github actions is that "The macOS-11 environment is deprecated and will be removed on June 28th, 2024."
Since MacOS 11 is no longer supported by the manufacturer, I think it would be fine to bump to MacOS 12. Sounds like GH actions will force us to drop it anyways.
I don't think this is a release blocker, as this is already an issue, and I am facing some issues when building and linking Arrow with MACOSX_DEPLOYMENT_TARGET=12. I'll move it to 18.0.0 and will keep working on it for the next release.
Issue resolved by pull request 43137 https://github.com/apache/arrow/pull/43137