MediaInfoLib icon indicating copy to clipboard operation
MediaInfoLib copied to clipboard

Segfaults on macOS ARM

Open maxlund opened this issue 6 months ago • 13 comments

Hi!

Getting some crashes on macOS ARM (MBP M1 Pro), which we believe happens when parsing MXF files from Avid Media Composer (i.e. they can be audio, video, image, or pure data files). Example log:

Crashed Thread:        42

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000025
Exception Codes:       0x0000000000000001, 0x0000000000000025

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [25968]

Thread 42 Crashed:
0   libmediainfo.0.dylib          	       0x382f2d554 MediaInfoLib::Reader_File::Format_Test(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>) + 808
1   libmediainfo.0.dylib          	       0x382b657b4 MediaInfoLib::MediaInfo_Internal::Entry() + 19188
2   libmediainfo.0.dylib          	       0x382b609e0 MediaInfoLib::MediaInfo_Internal::Open(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 504
3   libmediainfo.0.dylib          	       0x382fd55dc MediaInfo_Open + 176
4   libffi.dylib                  	       0x1a2af5050 ffi_call_SYSV + 80

We're calling MediaInfo from Python via pymediainfo:

import platform

import pymediainfo
from pymediainfo import MediaInfo

print(f"{pymediainfo.__version__}")
print(f"macOS {platform.mac_ver()}")
print(MediaInfo._get_library())

prints:

7.0.1
macOS ('15.0.1', ('', '', ''), 'arm64')
(<CDLL '/Users/maxlund/pymediainfo-test/.venv/lib/python3.10/site-packages/pymediainfo/libmediainfo.0.dylib', handle 65874810 at 0x122263d30>, 105553140273472, '24.12', (24, 12))

I downloaded libmediainfo via homebrew, I see that it's a later version:

print(MediaInfo._get_library("/opt/homebrew/Cellar/libmediainfo/25.04/lib/libmediainfo.0.dylib"))
-> (<CDLL '/opt/homebrew/Cellar/libmediainfo/25.04/lib/libmediainfo.0.dylib', handle 65874830 at 0x133723ac0>, 105553151484000, '25.04', (25, 4))

Think anything be fixed by changing to that version? More logs from all the crashes we've seen (so far):

Thread 36 Crashed:
0   libmediainfo.0.dylib          	       0x344d5d554 MediaInfoLib::Reader_File::Format_Test(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>) + 808
1   libmediainfo.0.dylib          	       0x3449957b4 MediaInfoLib::MediaInfo_Internal::Entry() + 19188
2   libmediainfo.0.dylib          	       0x3449909e0 MediaInfoLib::MediaInfo_Internal::Open(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 504
3   libmediainfo.0.dylib          	       0x344e055dc MediaInfo_Open + 176
Thread 27 Crashed:
0   libmediainfo.0.dylib          	       0x38fd6d554 MediaInfoLib::Reader_File::Format_Test(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>) + 808
1   libmediainfo.0.dylib          	       0x38f9a57b4 MediaInfoLib::MediaInfo_Internal::Entry() + 19188
2   libmediainfo.0.dylib          	       0x38f9a09e0 MediaInfoLib::MediaInfo_Internal::Open(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 504
3   libmediainfo.0.dylib          	       0x38fe155dc MediaInfo_Open + 176
4   libffi.dylib                  	       0x1a2af5050 ffi_call_SYSV + 80
Thread 24 Crashed:
0   libmediainfo.0.dylib          	       0x359d21554 MediaInfoLib::Reader_File::Format_Test(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>) + 808
1   libmediainfo.0.dylib          	       0x3599597b4 MediaInfoLib::MediaInfo_Internal::Entry() + 19188
2   libmediainfo.0.dylib          	       0x3599549e0 MediaInfoLib::MediaInfo_Internal::Open(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 504
3   libmediainfo.0.dylib          	       0x359dc95dc MediaInfo_Open + 176
Thread 35 Crashed:
0   libsystem_kernel.dylib        	       0x196aba5d0 __pthread_kill + 8
1   libsystem_pthread.dylib       	       0x196af2c20 pthread_kill + 288
2   libsystem_c.dylib             	       0x1969ffa30 abort + 180
3   libsystem_malloc.dylib        	       0x19690fdc4 malloc_vreport + 896
4   libsystem_malloc.dylib        	       0x196913430 malloc_report + 64
5   libsystem_malloc.dylib        	       0x19692d494 find_zone_and_free + 528
6   libmediainfo.0.dylib          	       0x40d739938 MediaInfoLib::File__Analyze::Stream_Prepare(MediaInfoLib::stream_t, unsigned long) + 1408
7   libmediainfo.0.dylib          	       0x40dae2c94 MediaInfoLib::File_Mxf::Streams_Finish_Essence(unsigned int, ZenLib::uint128) + 1196
8   libmediainfo.0.dylib          	       0x40dae1b74 MediaInfoLib::File_Mxf::Streams_Finish_Track(ZenLib::uint128) + 156
9   libmediainfo.0.dylib          	       0x40dae1878 MediaInfoLib::File_Mxf::Streams_Finish_Package(ZenLib::uint128) + 152
10  libmediainfo.0.dylib          	       0x40dae0e24 MediaInfoLib::File_Mxf::Streams_Finish_ContentStorage(ZenLib::uint128) + 144
11  libmediainfo.0.dylib          	       0x40dae0504 MediaInfoLib::File_Mxf::Streams_Finish_Preface(ZenLib::uint128) + 128
12  libmediainfo.0.dylib          	       0x40dad96cc MediaInfoLib::File_Mxf::Streams_Finish() + 604
13  libmediainfo.0.dylib          	       0x40d70ec7c MediaInfoLib::File__Analyze::ForceFinish(char const*) + 1064
14  libmediainfo.0.dylib          	       0x40daf1898 MediaInfoLib::File_Mxf::Read_Buffer_AfterParsing() + 476
15  libmediainfo.0.dylib          	       0x40d70e5b8 MediaInfoLib::File__Analyze::Open_Buffer_Continue_Loop() + 444
16  libmediainfo.0.dylib          	       0x40d70d934 MediaInfoLib::File__Analyze::Open_Buffer_Continue(unsigned char const*, unsigned long) + 1344
17  libmediainfo.0.dylib          	       0x40d70ff20 MediaInfoLib::File__Analyze::Open_Buffer_Finalize(bool) + 312
18  libmediainfo.0.dylib          	       0x40d7c194c MediaInfoLib::MediaInfo_Internal::Open_Buffer_Finalize() + 52
19  libmediainfo.0.dylib          	       0x40db86b7c MediaInfoLib::Reader_File::Format_Test_PerParser_Continue(MediaInfoLib::MediaInfo_Internal*) + 3372
20  libmediainfo.0.dylib          	       0x40db85ce0 MediaInfoLib::Reader_File::Format_Test_PerParser(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 1104
21  libmediainfo.0.dylib          	       0x40db856c8 MediaInfoLib::Reader_File::Format_Test(MediaInfoLib::MediaInfo_Internal*, std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>>) + 1180
22  libmediainfo.0.dylib          	       0x40d7bd7b4 MediaInfoLib::MediaInfo_Internal::Entry() + 19188
23  libmediainfo.0.dylib          	       0x40d7b89e0 MediaInfoLib::MediaInfo_Internal::Open(std::__1::basic_string<wchar_t, std::__1::char_traits<wchar_t>, std::__1::allocator<wchar_t>> const&) + 504
24  libmediainfo.0.dylib          	       0x40dc2d5dc MediaInfo_Open + 176

Any assistance is much appreciated. Thanks for all the great work!

maxlund avatar Jun 04 '25 08:06 maxlund

Please provide the file ([email protected] if it can not be shared publicly).

JeromeMartinez avatar Jun 04 '25 11:06 JeromeMartinez

We haven't been able to pinpoint which file actually caused the issue, as we ran through thousands of files and weren't able to log an exception due to the crash. Currently trying to reproduce the crash with verbose logging, stay tuned.

maxlund avatar Jun 04 '25 15:06 maxlund

However, we know that we can't reliably reproduce it since it occurred for the same set of MXF files that I've now successfully parsed using the MediaInfo lib.

maxlund avatar Jun 04 '25 15:06 maxlund

However, we know that we can't reliably reproduce it since it occurred for the same set of MXF files that I've now successfully parsed using the MediaInfo lib.

It is classic with memory corruption, and we can have a better catch of where it is with tools like Valgrind, if you can test with Valgrind (or we can if you provide the file).

JeromeMartinez avatar Jun 04 '25 16:06 JeromeMartinez

I just managed to reproduce it by running two threads through a couple of thousand MXF files. These are the last lines of the application logs:

2025-06-04 18:36:34,042 - INFO - pymediainfo.classify /Users/Shared/AvidMediaComposer/Avid MediaFiles/MXF/1/2ee500d1V01.C32CBEA2.608300.mxf

2025-06-04 18:36:34,100 - INFO - pymediainfo.classify /Volumes/Macintosh HD/Users/Shared/AvidMediaComposer/Avid MediaFiles/MXF/1/0cce979fV01.C1582149.608300.mxf

Resulting in these macOS crash logs:

Crashed Thread:        32

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000415111001200
Exception Codes:       0x0000000000000001, 0x0000415111001200

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [26810]

Thread 32 Crashed:
0   libsystem_platform.dylib      	       0x192d5be40 _platform_memmove + 144
1   libmediainfo.0.dylib          	       0x31da0b380 MediaInfoLib::MediaInfo_Internal::Get(MediaInfoLib::stream_t, unsigned long, unsigned long, MediaInfoLib::info_t) + 2228
2   libmediainfo.0.dylib          	       0x31d9f4c0c MediaInfoLib::MediaInfo_Internal::Inform(MediaInfoLib::stream_t, unsigned long, bool) + 9064
3   libmediainfo.0.dylib          	       0x31d9f10ac MediaInfoLib::MediaInfo_Internal::Inform() + 12516
4   libmediainfo.0.dylib          	       0x31da11528 MediaInfoLib::MediaInfo_Internal::Inform(std::__1::vector<MediaInfoLib::MediaInfo_Internal*, std::__1::allocator<MediaInfoLib::MediaInfo_Internal*>>&) + 13280
5   libmediainfo.0.dylib          	       0x31da0dfdc MediaInfoLib::MediaInfo_Internal::Inform(MediaInfoLib::MediaInfo_Internal*) + 120
6   libmediainfo.0.dylib          	       0x31d9c8bf0 MediaInfoLib::MediaInfo::Inform(unsigned long) + 32
7   libmediainfo.0.dylib          	       0x31dddcb5c MediaInfo_Inform + 160

It's most likely the file in the last line of the logs that causes the issue

0cce979fV01.C1582149.608300.mxf

.. but can't say for sure since the files in t he logs come from separate threads.

I've emailed both files to [email protected] right now.

maxlund avatar Jun 04 '25 16:06 maxlund

Ok I can reliably reproduce it by simulating a bunch of concurrent calls:

import os
from concurrent.futures import ThreadPoolExecutor, as_completed

from pymediainfo import MediaInfo

MEDIAINFO_LIB_PATH = os.path.expanduser('~/libmediainfo.0.dylib')
N_THREADS = 32
RUNS_PER_THREAD = 100

paths = [
    os.path.expanduser('~/mxf-error-file/0cce979fV01.C1582149.608300.mxf'),
    os.path.expanduser('~/mxf-error-file/2ee500d1V01.C32CBEA2.608300.mxf')
]
abort_flag = False


def mediainfo_parse(path):
    global abort_flag
    if abort_flag:
        return None
    try:
        print(f"Parsing {path}")
        MediaInfo.parse(filename=path, library_file=MEDIAINFO_LIB_PATH, mediainfo_options={
            "setlocale_LC_CTYPE": "",
            "CharSet": "UTF-8"
        })
        return f'{path} parsed OK'
    except Exception as e:
        abort_flag = True
        print(f"Error parsing {path}: {e}")
        print("Setting abort flag to stop spam...")
        return None


def _worker(repeats: int):
    for _ in range(repeats):
        for p in paths:
            mediainfo_parse(p)


if __name__ == "__main__":
    print(f"MediaInfo._get_library: {MediaInfo._get_library(MEDIAINFO_LIB_PATH)[2:]}")
    print("Running synchronously...")
    for i in range(10):
        for p in paths:
            mediainfo_parse(p)
    print("Done running synchronously")

    print("Starting concurrent calls...")
    with ThreadPoolExecutor(max_workers=N_THREADS) as pool:
        futures = [pool.submit(_worker, RUNS_PER_THREAD) for _ in range(N_THREADS)]
        for f in as_completed(futures):
            f.result()

Output:

└─ $ ▶  python mediainfo_crash.py
MediaInfo._get_library: ('25.04', (25, 4))
Running synchronously...
Parsing /Users/maxlund/mxf-error-file/0cce979fV01.C1582149.608300.mxf
... etc
Parsing /Users/maxlund/mxf-error-file/2ee500d1V01.C32CBEA2.608300.mxf
Done running synchronously
Starting concurrent calls...
Parsing /Users/maxlund/mxf-error-file/0cce979fV01.C1582149.608300.mxf
.. etc
Parsing /Users/maxlund/mxf-error-file/2ee500d1V01.C32CBEA2.608300.mxf
Error parsing /Users/maxlund/mxf-error-file/0cce979fV01.C1582149.608300.mxf: syntax error: line 1, column 0
Setting abort flag to stop spam...
Error parsing /Users/maxlund/mxf-error-file/0cce979fV01.C1582149.608300.mxf: syntax error: line 1, column 0
Setting abort flag to stop spam...
Error parsing /Users/maxlund/mxf-error-file/0cce979fV01.C1582149.608300.mxf: syntax error: line 1, column 0

Guess we'll just add a lock for now. Not a huge deal, but might block ourselves and impact performance in some situations

maxlund avatar Jun 04 '25 17:06 maxlund

I've emailed both files to [email protected] right now.

Received and we can reproduce the issue.

JeromeMartinez avatar Jun 05 '25 13:06 JeromeMartinez

Ok I can reliably reproduce it by simulating a bunch of concurrent calls:

When a configuration parameter is provided, pymediainfo clears the MediaInfo configuration at the end of the parse() function. Since the configuration is shared by all threads, this causes inconsistencies and crashes and should be avoided when using multiples threads, as indicated in the documentation: https://github.com/sbraz/pymediainfo/blob/master/src/pymediainfo/init.py#L435-L438

This is definitely something that needs to be changed on MediaInfo's side. In the meantime, though, you can sets the options before calling parse, like this:

#!/usr/bin/python

import os
from concurrent.futures import ThreadPoolExecutor, as_completed

from pymediainfo import MediaInfo

MEDIAINFO_LIB_PATH = os.path.expanduser('~/mxf-error-file/libmediainfo.so.0.0.0')
N_THREADS = 32
RUNS_PER_THREAD = 100

paths = [
    os.path.expanduser('~/mxf-error-file/0cce979fV01.C1582149.608300.mxf'),
    os.path.expanduser('~/mxf-error-file/2ee500d1V01.C32CBEA2.608300.mxf')
]
abort_flag = False


def mediainfo_parse(path):
    global abort_flag
    if abort_flag:
        return None
    try:
        print(f"Parsing {path}")
        MediaInfo.parse(filename=path, library_file=MEDIAINFO_LIB_PATH)
        return f'{path} parsed OK'
    except Exception as e:
        abort_flag = True
        print(f"Error parsing {path}: {e}")
        print("Setting abort flag to stop spam...")
        return None


def _worker(repeats: int):
    for _ in range(repeats):
        for p in paths:
            mediainfo_parse(p)


if __name__ == "__main__":
    print(f"MediaInfo._get_library: {MediaInfo._get_library(MEDIAINFO_LIB_PATH)[2:]}")
    lib, handle = MediaInfo._get_library(MEDIAINFO_LIB_PATH)[0:2]
    lib.MediaInfo_Option(handle, "setlocale_LC_CTYPE", "")
    lib.MediaInfo_Option(handle, "CharSet", "UTF-8")

    print("Running synchronously...")
    for i in range(10):
        for p in paths:
            mediainfo_parse(p)
    print("Done running synchronously")

    print("Starting concurrent calls...")
    with ThreadPoolExecutor(max_workers=N_THREADS) as pool:
        futures = [pool.submit(_worker, RUNS_PER_THREAD) for _ in range(N_THREADS)]
        for f in as_completed(futures):
            f.result()

g-maxime avatar Jun 06 '25 11:06 g-maxime

I didn't notice @maxlund was using options. Thanks for the debug and the snippet, @g-maxime! Indeed, calling MediaInfo_Option manually will work because it won't trigger https://github.com/sbraz/pymediainfo/blob/daf3596e33686c17639d4bd1a4f560983f24ea35/src/pymediainfo/init.py#L551-L552 I could add an option to not clear options at the end of parse() but I feel like it has too many options already 😅

I won't add this example to the documentation as it calls a private method but if I ever receive similar bug reports, I can point users to your comment until the library's thread-safety problem is fixed.

sbraz avatar Jun 07 '25 23:06 sbraz

Aha, thanks @g-maxime! So the .MXF files were a red herring. Sounds like I can just add the two lines calling MediaInfo_Option once during application startup, and remove the concurrency safeguards I added then.

Just FYI the reason I'm supplying options to the calls is that otherwise using non-ASCII paths failed. One of our users had their MXF files here:

/Volumes/avid06’s Mac Studio.1/some-mxf-file.mxf`

Which failed due to the "Mac-style" in avid06’s. This is not the case when calling MediaInfo CLI via Python subprocess calls (I assume locale is inherited from the parent process, which is why I never caught this when testing locally since I would either launch our binary or run the code directly from a Terminal or IDE that already had a UTF-8 locale).

maxlund avatar Jun 08 '25 06:06 maxlund

@maxlund The Charset is always forced as UTF-8: https://github.com/sbraz/pymediainfo/blob/daf3596e33686c17639d4bd1a4f560983f24ea35/src/pymediainfo/init.py#L487 I'm surprised that you cannot parse paths with non-ASCII names. Does the pymediainfo test suite pass on your system? I'm especially interested in pytest tests/test_pymediainfo.py::MediaInfoUnicodeFileNameTest::test_parse_unicode_file.

sbraz avatar Jun 08 '25 22:06 sbraz

└─ $ ▶  pytest tests/test_pymediainfo.py::MediaInfoUnicodeFileNameTest::test_parse_unicode_file
=============================================================================================== test session starts ================================================================================================
platform darwin -- Python 3.10.9, pytest-8.4.0, pluggy-1.6.0 -- /usr/local/bin/python3
cachedir: .pytest_cache
rootdir: /Users/maxlund/pymediainfo
configfile: pyproject.toml
plugins: anyio-4.4.0
collected 1 item

tests/test_pymediainfo.py::MediaInfoUnicodeFileNameTest::test_parse_unicode_file PASSED                                                                                                                      [100%]

================================================================================================ 1 passed in 0.02s =================================================================================================

This issue only shows up when I built a PyInstaller binary containing the pymediainfo calls, so the test passing is expected since I had no issues when testing locally, until after actually building the binary. We could probably recreate a minimal example with just pyinstaller and pymediainfo as dependencies. For reference I build with something like

pyinstaller --windowed main.py --noconfirm

maxlund avatar Jun 09 '25 10:06 maxlund

So you were actually hitting https://github.com/sbraz/pymediainfo/issues/121 then? And forcing setlocale_LC_CTYPE fixes it?

sbraz avatar Jun 09 '25 23:06 sbraz

@sbraz Sorry for the late reply, I missed your message somehow - yes, this fixes things:

        lib, handle = MediaInfo._get_library(MEDIAINFO_LIB_PATH)[0:2]
        lib.MediaInfo_Option(handle, "setlocale_LC_CTYPE", "")
        lib.MediaInfo_Option(handle, "CharSet", "UTF-8")

I also do:

            media_info = MediaInfo.parse(filename=file_path, library_file=MEDIAINFO_LIB_PATH, mediainfo_options={
                "setlocale_LC_CTYPE": "",  # empty string means use the process current LC_CTYPE
                "CharSet": "UTF-8"
            })

Maybe the settings mediainfo_options in the parsecalls would actually not be a good idea? It just feels "wrong" to rely on those being set implicitly

maxlund avatar Sep 09 '25 06:09 maxlund

Looking at the code, I would want to avoid this though I suppose?

        # Reset all options to their defaults so that they aren't
        # retained when the parse method is called several times
        # https://github.com/MediaArea/MediaInfoLib/issues/1128
        # Do not call it when it is not required because it breaks threads
        # https://github.com/sbraz/pymediainfo/issues/76#issuecomment-575245093
        if mediainfo_options is not None and lib_version >= (19, 9):
            lib.MediaInfo_Option(handle, "Reset", "")

maxlund avatar Sep 09 '25 06:09 maxlund

If you don't do call parse() several times in multiple threads, you should be fine.

sbraz avatar Sep 09 '25 20:09 sbraz

If you don't do call parse() several times in multiple threads, you should be fine.

Understood, but we could have situations where that happens. But using a runtime hook in PyInstaller to set the locale and also using MediaInfo_Option seems to work fine with non-ASCII file paths

maxlund avatar Sep 09 '25 21:09 maxlund

Thanks for the feedback, link to said hook to help PyInstaller users: https://github.com/sbraz/pymediainfo/issues/121#issuecomment-3269037665.

sbraz avatar Sep 09 '25 21:09 sbraz