Segfaults on macOS ARM
Hi!
Getting some crashes on macOS ARM (MBP M1 Pro), which we believe happens when parsing MXF files from Avid Media Composer (i.e. they can be audio, video, image, or pure data files). Example log:
Crashed Thread: 42
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000025
Exception Codes: 0x0000000000000001, 0x0000000000000025
Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process: exc handler [25968]
Thread 42 Crashed:
0 libmediainfo.0.dylib 0x382f2d554 MediaInfoLib::Reader_File::Format_Test(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>) + 808
1 libmediainfo.0.dylib 0x382b657b4 MediaInfoLib::MediaInfo_Internal::Entry() + 19188
2 libmediainfo.0.dylib 0x382b609e0 MediaInfoLib::MediaInfo_Internal::Open(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 504
3 libmediainfo.0.dylib 0x382fd55dc MediaInfo_Open + 176
4 libffi.dylib 0x1a2af5050 ffi_call_SYSV + 80
We're calling MediaInfo from Python via pymediainfo:
import platform
import pymediainfo
from pymediainfo import MediaInfo
print(f"{pymediainfo.__version__}")
print(f"macOS {platform.mac_ver()}")
print(MediaInfo._get_library())
prints:
7.0.1
macOS ('15.0.1', ('', '', ''), 'arm64')
(<CDLL '/Users/maxlund/pymediainfo-test/.venv/lib/python3.10/site-packages/pymediainfo/libmediainfo.0.dylib', handle 65874810 at 0x122263d30>, 105553140273472, '24.12', (24, 12))
I downloaded libmediainfo via homebrew, I see that it's a later version:
print(MediaInfo._get_library("/opt/homebrew/Cellar/libmediainfo/25.04/lib/libmediainfo.0.dylib"))
-> (<CDLL '/opt/homebrew/Cellar/libmediainfo/25.04/lib/libmediainfo.0.dylib', handle 65874830 at 0x133723ac0>, 105553151484000, '25.04', (25, 4))
Think anything be fixed by changing to that version? More logs from all the crashes we've seen (so far):
Thread 36 Crashed:
0 libmediainfo.0.dylib 0x344d5d554 MediaInfoLib::Reader_File::Format_Test(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>) + 808
1 libmediainfo.0.dylib 0x3449957b4 MediaInfoLib::MediaInfo_Internal::Entry() + 19188
2 libmediainfo.0.dylib 0x3449909e0 MediaInfoLib::MediaInfo_Internal::Open(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 504
3 libmediainfo.0.dylib 0x344e055dc MediaInfo_Open + 176
Thread 27 Crashed:
0 libmediainfo.0.dylib 0x38fd6d554 MediaInfoLib::Reader_File::Format_Test(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>) + 808
1 libmediainfo.0.dylib 0x38f9a57b4 MediaInfoLib::MediaInfo_Internal::Entry() + 19188
2 libmediainfo.0.dylib 0x38f9a09e0 MediaInfoLib::MediaInfo_Internal::Open(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 504
3 libmediainfo.0.dylib 0x38fe155dc MediaInfo_Open + 176
4 libffi.dylib 0x1a2af5050 ffi_call_SYSV + 80
Thread 24 Crashed:
0 libmediainfo.0.dylib 0x359d21554 MediaInfoLib::Reader_File::Format_Test(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>) + 808
1 libmediainfo.0.dylib 0x3599597b4 MediaInfoLib::MediaInfo_Internal::Entry() + 19188
2 libmediainfo.0.dylib 0x3599549e0 MediaInfoLib::MediaInfo_Internal::Open(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 504
3 libmediainfo.0.dylib 0x359dc95dc MediaInfo_Open + 176
Thread 35 Crashed:
0 libsystem_kernel.dylib 0x196aba5d0 __pthread_kill + 8
1 libsystem_pthread.dylib 0x196af2c20 pthread_kill + 288
2 libsystem_c.dylib 0x1969ffa30 abort + 180
3 libsystem_malloc.dylib 0x19690fdc4 malloc_vreport + 896
4 libsystem_malloc.dylib 0x196913430 malloc_report + 64
5 libsystem_malloc.dylib 0x19692d494 find_zone_and_free + 528
6 libmediainfo.0.dylib 0x40d739938 MediaInfoLib::File__Analyze::Stream_Prepare(MediaInfoLib::stream_t, unsigned long) + 1408
7 libmediainfo.0.dylib 0x40dae2c94 MediaInfoLib::File_Mxf::Streams_Finish_Essence(unsigned int, ZenLib::uint128) + 1196
8 libmediainfo.0.dylib 0x40dae1b74 MediaInfoLib::File_Mxf::Streams_Finish_Track(ZenLib::uint128) + 156
9 libmediainfo.0.dylib 0x40dae1878 MediaInfoLib::File_Mxf::Streams_Finish_Package(ZenLib::uint128) + 152
10 libmediainfo.0.dylib 0x40dae0e24 MediaInfoLib::File_Mxf::Streams_Finish_ContentStorage(ZenLib::uint128) + 144
11 libmediainfo.0.dylib 0x40dae0504 MediaInfoLib::File_Mxf::Streams_Finish_Preface(ZenLib::uint128) + 128
12 libmediainfo.0.dylib 0x40dad96cc MediaInfoLib::File_Mxf::Streams_Finish() + 604
13 libmediainfo.0.dylib 0x40d70ec7c MediaInfoLib::File__Analyze::ForceFinish(char const*) + 1064
14 libmediainfo.0.dylib 0x40daf1898 MediaInfoLib::File_Mxf::Read_Buffer_AfterParsing() + 476
15 libmediainfo.0.dylib 0x40d70e5b8 MediaInfoLib::File__Analyze::Open_Buffer_Continue_Loop() + 444
16 libmediainfo.0.dylib 0x40d70d934 MediaInfoLib::File__Analyze::Open_Buffer_Continue(unsigned char const*, unsigned long) + 1344
17 libmediainfo.0.dylib 0x40d70ff20 MediaInfoLib::File__Analyze::Open_Buffer_Finalize(bool) + 312
18 libmediainfo.0.dylib 0x40d7c194c MediaInfoLib::MediaInfo_Internal::Open_Buffer_Finalize() + 52
19 libmediainfo.0.dylib 0x40db86b7c MediaInfoLib::Reader_File::Format_Test_PerParser_Continue(MediaInfoLib::MediaInfo_Internal*) + 3372
20 libmediainfo.0.dylib 0x40db85ce0 MediaInfoLib::Reader_File::Format_Test_PerParser(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 1104
21 libmediainfo.0.dylib 0x40db856c8 MediaInfoLib::Reader_File::Format_Test(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>) + 1180
22 libmediainfo.0.dylib 0x40d7bd7b4 MediaInfoLib::MediaInfo_Internal::Entry() + 19188
23 libmediainfo.0.dylib 0x40d7b89e0 MediaInfoLib::MediaInfo_Internal::Open(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 504
24 libmediainfo.0.dylib 0x40dc2d5dc MediaInfo_Open + 176
Any assistance is much appreciated. Thanks for all the great work!
Please provide the file ([email protected] if it can not be shared publicly).
We haven't been able to pinpoint which file actually caused the issue, as we ran through thousands of files and weren't able to log an exception due to the crash. Currently trying to reproduce the crash with verbose logging, stay tuned.
However, we know that we can't reliably reproduce it since it occurred for the same set of MXF files that I've now successfully parsed using the MediaInfo lib.
However, we know that we can't reliably reproduce it since it occurred for the same set of MXF files that I've now successfully parsed using the MediaInfo lib.
It is classic with memory corruption, and we can have a better catch of where it is with tools like Valgrind, if you can test with Valgrind (or we can if you provide the file).
I just managed to reproduce it by running two threads through a couple of thousand MXF files. These are the last lines of the application logs:
2025-06-04 18:36:34,042 - INFO - pymediainfo.classify /Users/Shared/AvidMediaComposer/Avid MediaFiles/MXF/1/2ee500d1V01.C32CBEA2.608300.mxf
2025-06-04 18:36:34,100 - INFO - pymediainfo.classify /Volumes/Macintosh HD/Users/Shared/AvidMediaComposer/Avid MediaFiles/MXF/1/0cce979fV01.C1582149.608300.mxf
Resulting in these macOS crash logs:
Crashed Thread: 32
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000415111001200
Exception Codes: 0x0000000000000001, 0x0000415111001200
Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process: exc handler [26810]
Thread 32 Crashed:
0 libsystem_platform.dylib 0x192d5be40 _platform_memmove + 144
1 libmediainfo.0.dylib 0x31da0b380 MediaInfoLib::MediaInfo_Internal::Get(MediaInfoLib::stream_t, unsigned long, unsigned long, MediaInfoLib::info_t) + 2228
2 libmediainfo.0.dylib 0x31d9f4c0c MediaInfoLib::MediaInfo_Internal::Inform(MediaInfoLib::stream_t, unsigned long, bool) + 9064
3 libmediainfo.0.dylib 0x31d9f10ac MediaInfoLib::MediaInfo_Internal::Inform() + 12516
4 libmediainfo.0.dylib 0x31da11528 MediaInfoLib::MediaInfo_Internal::Inform(std::__1::vector<MediaInfoLib::MediaInfo_Internal*, std::__1::allocator<MediaInfoLib::MediaInfo_Internal*>>&) + 13280
5 libmediainfo.0.dylib 0x31da0dfdc MediaInfoLib::MediaInfo_Internal::Inform(MediaInfoLib::MediaInfo_Internal*) + 120
6 libmediainfo.0.dylib 0x31d9c8bf0 MediaInfoLib::MediaInfo::Inform(unsigned long) + 32
7 libmediainfo.0.dylib 0x31dddcb5c MediaInfo_Inform + 160
It's most likely the file in the last line of the logs that causes the issue
0cce979fV01.C1582149.608300.mxf
.. but can't say for sure since the files in t he logs come from separate threads.
I've emailed both files to [email protected] right now.
Ok I can reliably reproduce it by simulating a bunch of concurrent calls:
import os
from concurrent.futures import ThreadPoolExecutor, as_completed
from pymediainfo import MediaInfo
MEDIAINFO_LIB_PATH = os.path.expanduser('~/libmediainfo.0.dylib')
N_THREADS = 32
RUNS_PER_THREAD = 100
paths = [
os.path.expanduser('~/mxf-error-file/0cce979fV01.C1582149.608300.mxf'),
os.path.expanduser('~/mxf-error-file/2ee500d1V01.C32CBEA2.608300.mxf')
]
abort_flag = False
def mediainfo_parse(path):
global abort_flag
if abort_flag:
return None
try:
print(f"Parsing {path}")
MediaInfo.parse(filename=path, library_file=MEDIAINFO_LIB_PATH, mediainfo_options={
"setlocale_LC_CTYPE": "",
"CharSet": "UTF-8"
})
return f'{path} parsed OK'
except Exception as e:
abort_flag = True
print(f"Error parsing {path}: {e}")
print("Setting abort flag to stop spam...")
return None
def _worker(repeats: int):
for _ in range(repeats):
for p in paths:
mediainfo_parse(p)
if __name__ == "__main__":
print(f"MediaInfo._get_library: {MediaInfo._get_library(MEDIAINFO_LIB_PATH)[2:]}")
print("Running synchronously...")
for i in range(10):
for p in paths:
mediainfo_parse(p)
print("Done running synchronously")
print("Starting concurrent calls...")
with ThreadPoolExecutor(max_workers=N_THREADS) as pool:
futures = [pool.submit(_worker, RUNS_PER_THREAD) for _ in range(N_THREADS)]
for f in as_completed(futures):
f.result()
Output:
└─ $ ▶ python mediainfo_crash.py
MediaInfo._get_library: ('25.04', (25, 4))
Running synchronously...
Parsing /Users/maxlund/mxf-error-file/0cce979fV01.C1582149.608300.mxf
... etc
Parsing /Users/maxlund/mxf-error-file/2ee500d1V01.C32CBEA2.608300.mxf
Done running synchronously
Starting concurrent calls...
Parsing /Users/maxlund/mxf-error-file/0cce979fV01.C1582149.608300.mxf
.. etc
Parsing /Users/maxlund/mxf-error-file/2ee500d1V01.C32CBEA2.608300.mxf
Error parsing /Users/maxlund/mxf-error-file/0cce979fV01.C1582149.608300.mxf: syntax error: line 1, column 0
Setting abort flag to stop spam...
Error parsing /Users/maxlund/mxf-error-file/0cce979fV01.C1582149.608300.mxf: syntax error: line 1, column 0
Setting abort flag to stop spam...
Error parsing /Users/maxlund/mxf-error-file/0cce979fV01.C1582149.608300.mxf: syntax error: line 1, column 0
Guess we'll just add a lock for now. Not a huge deal, but might block ourselves and impact performance in some situations
Ok I can reliably reproduce it by simulating a bunch of concurrent calls:
When a configuration parameter is provided, pymediainfo clears the MediaInfo configuration at the end of the parse() function. Since the configuration is shared by all threads, this causes inconsistencies and crashes and should be avoided when using multiples threads, as indicated in the documentation: https://github.com/sbraz/pymediainfo/blob/master/src/pymediainfo/init.py#L435-L438
This is definitely something that needs to be changed on MediaInfo's side. In the meantime, though, you can sets the options before calling parse, like this:
#!/usr/bin/python
import os
from concurrent.futures import ThreadPoolExecutor, as_completed
from pymediainfo import MediaInfo
MEDIAINFO_LIB_PATH = os.path.expanduser('~/mxf-error-file/libmediainfo.so.0.0.0')
N_THREADS = 32
RUNS_PER_THREAD = 100
paths = [
os.path.expanduser('~/mxf-error-file/0cce979fV01.C1582149.608300.mxf'),
os.path.expanduser('~/mxf-error-file/2ee500d1V01.C32CBEA2.608300.mxf')
]
abort_flag = False
def mediainfo_parse(path):
global abort_flag
if abort_flag:
return None
try:
print(f"Parsing {path}")
MediaInfo.parse(filename=path, library_file=MEDIAINFO_LIB_PATH)
return f'{path} parsed OK'
except Exception as e:
abort_flag = True
print(f"Error parsing {path}: {e}")
print("Setting abort flag to stop spam...")
return None
def _worker(repeats: int):
for _ in range(repeats):
for p in paths:
mediainfo_parse(p)
if __name__ == "__main__":
print(f"MediaInfo._get_library: {MediaInfo._get_library(MEDIAINFO_LIB_PATH)[2:]}")
lib, handle = MediaInfo._get_library(MEDIAINFO_LIB_PATH)[0:2]
lib.MediaInfo_Option(handle, "setlocale_LC_CTYPE", "")
lib.MediaInfo_Option(handle, "CharSet", "UTF-8")
print("Running synchronously...")
for i in range(10):
for p in paths:
mediainfo_parse(p)
print("Done running synchronously")
print("Starting concurrent calls...")
with ThreadPoolExecutor(max_workers=N_THREADS) as pool:
futures = [pool.submit(_worker, RUNS_PER_THREAD) for _ in range(N_THREADS)]
for f in as_completed(futures):
f.result()
I didn't notice @maxlund was using options. Thanks for the debug and the snippet, @g-maxime! Indeed, calling MediaInfo_Option manually will work because it won't trigger https://github.com/sbraz/pymediainfo/blob/daf3596e33686c17639d4bd1a4f560983f24ea35/src/pymediainfo/init.py#L551-L552
I could add an option to not clear options at the end of parse() but I feel like it has too many options already 😅
I won't add this example to the documentation as it calls a private method but if I ever receive similar bug reports, I can point users to your comment until the library's thread-safety problem is fixed.
Aha, thanks @g-maxime! So the .MXF files were a red herring. Sounds like I can just add the two lines calling MediaInfo_Option once during application startup, and remove the concurrency safeguards I added then.
Just FYI the reason I'm supplying options to the calls is that otherwise using non-ASCII paths failed. One of our users had their MXF files here:
/Volumes/avid06’s Mac Studio.1/some-mxf-file.mxf`
Which failed due to the "Mac-style" ’ in avid06’s. This is not the case when calling MediaInfo CLI via Python subprocess calls (I assume locale is inherited from the parent process, which is why I never caught this when testing locally since I would either launch our binary or run the code directly from a Terminal or IDE that already had a UTF-8 locale).
@maxlund The Charset is always forced as UTF-8:
https://github.com/sbraz/pymediainfo/blob/daf3596e33686c17639d4bd1a4f560983f24ea35/src/pymediainfo/init.py#L487
I'm surprised that you cannot parse paths with non-ASCII names. Does the pymediainfo test suite pass on your system? I'm especially interested in pytest tests/test_pymediainfo.py::MediaInfoUnicodeFileNameTest::test_parse_unicode_file.
└─ $ ▶ pytest tests/test_pymediainfo.py::MediaInfoUnicodeFileNameTest::test_parse_unicode_file
=============================================================================================== test session starts ================================================================================================
platform darwin -- Python 3.10.9, pytest-8.4.0, pluggy-1.6.0 -- /usr/local/bin/python3
cachedir: .pytest_cache
rootdir: /Users/maxlund/pymediainfo
configfile: pyproject.toml
plugins: anyio-4.4.0
collected 1 item
tests/test_pymediainfo.py::MediaInfoUnicodeFileNameTest::test_parse_unicode_file PASSED [100%]
================================================================================================ 1 passed in 0.02s =================================================================================================
This issue only shows up when I built a PyInstaller binary containing the pymediainfo calls, so the test passing is expected since I had no issues when testing locally, until after actually building the binary. We could probably recreate a minimal example with just pyinstaller and pymediainfo as dependencies. For reference I build with something like
pyinstaller --windowed main.py --noconfirm
So you were actually hitting https://github.com/sbraz/pymediainfo/issues/121 then? And forcing setlocale_LC_CTYPE fixes it?
@sbraz Sorry for the late reply, I missed your message somehow - yes, this fixes things:
lib, handle = MediaInfo._get_library(MEDIAINFO_LIB_PATH)[0:2]
lib.MediaInfo_Option(handle, "setlocale_LC_CTYPE", "")
lib.MediaInfo_Option(handle, "CharSet", "UTF-8")
I also do:
media_info = MediaInfo.parse(filename=file_path, library_file=MEDIAINFO_LIB_PATH, mediainfo_options={
"setlocale_LC_CTYPE": "", # empty string means use the process current LC_CTYPE
"CharSet": "UTF-8"
})
Maybe the settings mediainfo_options in the parsecalls would actually not be a good idea? It just feels "wrong" to rely on those being set implicitly
Looking at the code, I would want to avoid this though I suppose?
# Reset all options to their defaults so that they aren't
# retained when the parse method is called several times
# https://github.com/MediaArea/MediaInfoLib/issues/1128
# Do not call it when it is not required because it breaks threads
# https://github.com/sbraz/pymediainfo/issues/76#issuecomment-575245093
if mediainfo_options is not None and lib_version >= (19, 9):
lib.MediaInfo_Option(handle, "Reset", "")
If you don't do call parse() several times in multiple threads, you should be fine.
If you don't do call
parse()several times in multiple threads, you should be fine.
Understood, but we could have situations where that happens. But using a runtime hook in PyInstaller to set the locale and also using MediaInfo_Option seems to work fine with non-ASCII file paths
Thanks for the feedback, link to said hook to help PyInstaller users: https://github.com/sbraz/pymediainfo/issues/121#issuecomment-3269037665.