[ffmpeg] process exits without any output on GHA
Description / Steps to reproduce the issue
Run ffmpeg -version on GHA. Sometimes it will just output nothing and exit with 0.
See https://github.com/kasper93/msys2-stdout-test for the testcase and strace output. test.sh runs in loop ffmpeg and checks if output is correct, after some time it just outputs nothing, in which case I print strace output.
I couldn't reproduce this locally. I'm not certain where the actual root cause is, I'm happy to help debug this, but not sure where to start. I tried to reproduce it with small standalone application, but it didn't work the same way. I presume there might be something related to dozen of dependencies loaded first, but not sure.
The common factor for the failure is running it on GHA and ffmpeg itself (but I don't test other apps...). Though the same ffmpeg binary run locally on hardware doesn't exhibit the issue, in both my tests, and another FATE runner. Also looks like 32-bit version is not affected.
In real world examples it fails randomly FFmpeg FATE tests (ex. https://fate.ffmpeg.org/report.cgi?slot=msys2-clang64&time=20251126185058). pixfmts tests are affected, because they run a lot of ffmpeg.exe calls, but I've seen also smaller tests fail without output.
Also similar failure sometimes happens when during mpv configure ffmpeg version couldn't be determined (ex. https://github.com/mpv-player/mpv/actions/runs/19418306401/job/55550896472)
test/meson.build:136:17: ERROR: Could not find a version number in output of ['D:\\a\\_temp\\msys64\\clang64\\bin/ffmpeg.EXE', '-version']
Any ideas?
Expected behavior
ffmpeg -version outputs version or non-zero exit code on error. stdout if flushed at newline and on program exit.
Actual behavior
ffmpeg -version doesn't output version and zero exit code, without error.
Verification
- [x] I have verified that my MSYS2 is up-to-date before submitting the report (see https://www.msys2.org/docs/updating/)
Windows Version
MINGW64_NT-10.0-26100
MINGW environments affected
- [x] MINGW64
- [ ] MINGW32
- [x] UCRT64
- [x] CLANG64
- [x] CLANGARM64
Are you willing to submit a PR?
No response
It reproduces for me locally. Not sure how to debug it though.
Random things:
- rebuild doesn't help
- ~~not reproducible with mingw64, but with ucrt64/clang64, so maybe ucrt related?~~ Correction: happened with mingw64 too, just rarely.
Adding a fflush(NULL) at the end of ffmpeg fixes things. @lb90 you've been patching glib for a related thing afair, do you have any idea what could be a cause for this?
Hi! Sort of, yes.
The CRT must flush all open FILE streams on application exit (as required by the C standard). However that final flush is not very safe...
it's done from the CRT's DllMain on DLL_PROCESS_DETACH. This brings some issues:
- If any other DLL acquires a lock (on
DLL_PROCESS_DETACH) before the CRT's DllMain is run, the process wil be terminated abruptly and flushing will be skipped. Consider however that the process is terminated with the same exit code passed to exit / ExitProcess. So the app may exit with 0 even though flushing was skipped. - To make things worse, the final flushing operation must acquire a lock for each open FILE stream. So if a thread is doing I/O on a stream when exit is called, some streams will be left unflushed.
In short, the CRT goes against the official rules for DllMain :-)
BTW, I plan to fix that in mingw-w64-crt. However the issue remains in MSVC builds
@lazka can you reproduce under a debugger? If so, try setting a breakpoint on ZwTerminateProcess (or NtTerminateProcess). Normally it's called only one time, but in case of abrupt exit it's called two times. The stacktrace on the second hit should give useful informations
An alternaitve is to enable Loader Snaps output
EDIT: Hey, I can reproduce too! Let's see...
EDIT: Hey, I can reproduce too! Let's see...
I actually can repro now too. It was a mix of different builds that made it looks like it is not locally. Without looking into details I made some minimization of the build to see what triggers the issue.
It looks like you need both --enable-libssh --enable-librsvg, other options are not relevant. If either of those libs are excluded, issue does not reproduce anymore. Saying that in case it helps you focus on specific area.
EDIT: This also explains with mingw32 is not affected, because librsvg is not available there.
That's what I could get:
Thread 1 hit Breakpoint 1, 0x00007fff79682230 in ntdll!ZwTerminateProcess () from C:\WINDOWS\SYSTEM32\ntdll.dll
#0 0x00007fff79682230 in ntdll!ZwTerminateProcess () from C:\WINDOWS\SYSTEM32\ntdll.dll
#1 0x00007fff7952b01b in ntdll!RtlRaiseException () from C:\WINDOWS\SYSTEM32\ntdll.dll
#2 0x00007fff7952c5ff in ntdll!RtlSleepConditionVariableCS () from C:\WINDOWS\SYSTEM32\ntdll.dll
#3 0x00007fff7952d872 in ntdll!RtlEnterCriticalSection () from C:\WINDOWS\SYSTEM32\ntdll.dll
#4 0x00007fff7952d39f in ntdll!RtlSleepConditionVariableCS () from C:\WINDOWS\SYSTEM32\ntdll.dll
#5 0x00007fff7952ce74 in ntdll!RtlSleepConditionVariableCS () from C:\WINDOWS\SYSTEM32\ntdll.dll
#6 0x00007fff79535541 in ntdll!LdrGetDllHandle () from C:\WINDOWS\SYSTEM32\ntdll.dll
#7 0x00007fff76ddfc86 in KERNELBASE!GetModuleHandleW () from C:\WINDOWS\System32\KernelBase.dll
#8 0x00007fff7909fb88 in SHCORE!IStream_Reset () from C:\WINDOWS\System32\SHCore.dll
#9 0x00007fff7909fb39 in SHCORE!IStream_Reset () from C:\WINDOWS\System32\SHCore.dll
#10 0x00007fff790664c1 in ?? () from C:\WINDOWS\System32\SHCore.dll
#11 0x00007fff79064d99 in ?? () from C:\WINDOWS\System32\SHCore.dll
#12 0x00007fff768abc75 in ucrtbase!bsearch_s () from C:\WINDOWS\System32\ucrtbase.dll
#13 0x00007fff768ab897 in ucrtbase!_execute_onexit_table () from C:\WINDOWS\System32\ucrtbase.dll
#14 0x00007fff768ab84d in ucrtbase!_execute_onexit_table () from C:\WINDOWS\System32\ucrtbase.dll
#15 0x00007fff768d2f0c in ucrtbase!_stricmp_l () from C:\WINDOWS\System32\ucrtbase.dll
#16 0x00007fff790c3235 in UnregisterScaleChangeEvent () from C:\WINDOWS\System32\SHCore.dll
#17 0x00007fff790c335e in UnregisterScaleChangeEvent () from C:\WINDOWS\System32\SHCore.dll
#18 0x00007fff7967f96e in ntdll!RtlEncodeRemotePointer () from C:\WINDOWS\SYSTEM32\ntdll.dll
#19 0x00007fff7952bcae in ntdll!RtlRaiseException () from C:\WINDOWS\SYSTEM32\ntdll.dll
#20 0x00007fff795ad4af in ntdll!LdrShutdownProcess () from C:\WINDOWS\SYSTEM32\ntdll.dll
#21 0x00007fff795ac67e in ntdll!RtlExitUserProcess () from C:\WINDOWS\SYSTEM32\ntdll.dll
#22 0x00007fff78c518ab in KERNEL32!FatalExit () from C:\WINDOWS\System32\kernel32.dll
#23 0x00007fff76900093 in ucrtbase!logbf () from C:\WINDOWS\System32\ucrtbase.dll
#24 0x00007ff6404b13d9 in ?? ()
#25 0x00007ff6404b1416 in ?? ()
#26 0x00007fff78c3e8d7 in KERNEL32!BaseThreadInitThunk () from C:\WINDOWS\System32\kernel32.dll
#27 0x00007fff795ac53c in ntdll!RtlUserThreadStart () from C:\WINDOWS\SYSTEM32\ntdll.dll
#28 0x0000000000000000 in ?? ()
Symbols are not exactly right because gdb can use only named exports
It looks like you need both
--enable-libssh --enable-librsvg, other options are not relevant. If either of those libs are excluded, issue does not reproduce anymore.
I've run full tests, and without libssh, it still failed. It just happens to reproduce way easier in small -version testcase with it. Actually removing librsvg fixes it in all cases.
EDIT: Also in check(), you shouldn't do that https://github.com/msys2/MINGW-packages/blob/f5153f7df726235641ccdc2f9245e01970a5834c/mingw-w64-ffmpeg/PKGBUILD#L216-L216 it's good to test before deploying.
Ok, here's a detailed stacktrace obtained with VS:
> ntdll.dll!NtTerminateProcess() Unknown
ntdll.dll!RtlpWaitOnCriticalSection() Unknown
ntdll.dll!RtlpEnterCriticalSectionContended() Unknown
ntdll.dll!RtlEnterCriticalSection() Unknown
ntdll.dll!LdrpAddUnicodeStringToSnapsBuffer() Unknown
ntdll.dll!LdrpLogInternal() Unknown
ntdll.dll!LdrGetDllHandle() Unknown
KERNELBASE.dll!00007fff76ddfc86() Unknown
SHCore.dll!wil_details_GetNtDllModuleHandle(void) Unknown
SHCore.dll!wil::details::RtlDllShutdownInProgress(void) Unknown
SHCore.dll!wil::ProcessShutdownInProgress(void) Unknown
SHCore.dll!wil::details::`dynamic atexit destructor for 'g_enabledStateManager''() Unknown
ucrtbase.dll!<lambda>(void)() Unknown
ucrtbase.dll!__crt_seh_guarded_call<int>::operator()<<lambda_7777bce6b2f8c936911f934f8298dc43>,<lambda>(void) &,<lambda_3883c3dff614d5e0c5f61bb1ac94921c>>() Unknown
ucrtbase.dll!_execute_onexit_table() Unknown
ucrtbase.dll!__crt_state_management::wrapped_invoke<int (*)(int *),int *,int>(int (*)(int *),int *) Unknown
SHCore.dll!dllmain_crt_process_detach() Unknown
SHCore.dll!dllmain_dispatch() Unknown
ntdll.dll!LdrpCallInitRoutineInternal() Unknown
ntdll.dll!LdrpCallInitRoutine() Unknown
ntdll.dll!LdrShutdownProcess() Unknown
ntdll.dll!RtlExitUserProcess() Unknown
kernel32.dll!ExitProcessImplementation() Unknown
ucrtbase.dll!common_exit() Unknown
ffmpeg.exe!00007ff6404b13d9() Unknown
ffmpeg.exe!00007ff6404b1416() Unknown
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown
So it's actually an issue in SHCore.dll. More specifically in WIL
So it's actualy point 1 from https://github.com/msys2/MINGW-packages/issues/26646#issuecomment-3597423787.
That said, I'd avoid depending on the late flush implemented by the CRT. A solution is to add an early flush from the ffmpeg executable:
static void
safe_flush (void)
{
fflush (NULL);
}
int main()
{
atexit (safe_flush);
/* ... */
}
Maybe this will be added to mingw-w64-crt, but will be needed for MSVC in any case
Actually removing librsvg fixes it in all cases.
Ah, this is interesting :)
$ gdb --args ffmpeg -version
GNU gdb (GDB) 16.3
Copyright (C) 2024 Free Software Foundation, Inc.
Reading symbols from ffmpeg...
(No debugging symbols found in ffmpeg)
(gdb) set stop-on-solib-events 1
(gdb) r
Starting program: D:\msys64\ucrt64\bin\ffmpeg.exe -version
Stopped due to shared library event:
Inferior loaded C:\WINDOWS\System32\imm32.dll
(gdb) c
(...)
Stopped due to shared library event:
Inferior loaded C:\WINDOWS\System32\SHCore.dll
(gdb) bt
#0 0x00007fff796821c4 in ntdll!ZwMapViewOfSection () from C:\WINDOWS\SYSTEM32\ntdll.dll
#1 0x00007fff795253db in ntdll!RtlAppendUnicodeStringToString () from C:\WINDOWS\SYSTEM32\ntdll.dll
#2 0x00007fff79524e1e in ntdll!RtlAppendUnicodeStringToString () from C:\WINDOWS\SYSTEM32\ntdll.dll
#3 0x00007fff795249f1 in ntdll!RtlAppendUnicodeStringToString () from C:\WINDOWS\SYSTEM32\ntdll.dll
#4 0x00007fff7953ad95 in ntdll!RtlDeactivateActivationContextUnsafeFast () from C:\WINDOWS\SYSTEM32\ntdll.dll
#5 0x00007fff79536382 in ntdll!LdrGetDllHandleEx () from C:\WINDOWS\SYSTEM32\ntdll.dll
#6 0x00007fff79536996 in ntdll!LdrGetDllHandleEx () from C:\WINDOWS\SYSTEM32\ntdll.dll
#7 0x00007fff7956d7d3 in ntdll!RtlGuardCheckLongJumpTarget () from C:\WINDOWS\SYSTEM32\ntdll.dll
#8 0x00007fff7956b94d in ntdll!LdrResolveDelayLoadedAPI () from C:\WINDOWS\SYSTEM32\ntdll.dll
#9 0x00007fff7763bcb2 in SHELL32_SendToMenu_VerifyTargetedCommand () from C:\WINDOWS\System32\shell32.dll
#10 0x00007fff77737d65 in StrStrW () from C:\WINDOWS\System32\shell32.dll
#11 0x00007ff72d863fb8 in ?? ()
#12 0x00007ff72d865ad2 in ?? ()
#13 0x00007ff72d853cb0 in ?? ()
#14 0x00007ff72d87f5b2 in ?? ()
#15 0x00007ff72d8310c9 in ?? ()
#16 0x00007ff72d831416 in ?? ()
#17 0x00007fff78c3e8d7 in KERNEL32!BaseThreadInitThunk () from C:\WINDOWS\System32\kernel32.dll
#18 0x00007fff795ac53c in ntdll!RtlUserThreadStart () from C:\WINDOWS\SYSTEM32\ntdll.dll
#19 0x0000000000000000 in ?? ()
(gdb)
Same stacktrace on VS:
> ntdll.dll!NtMapViewOfSection() Unknown
ntdll.dll!LdrpMinimalMapModule() Unknown
ntdll.dll!LdrpMapDllWithSectionHandle() Unknown
ntdll.dll!LdrpLoadKnownDll() Unknown
ntdll.dll!LdrpFindOrPrepareLoadingModule() Unknown
ntdll.dll!LdrpLoadDllInternal() Unknown
ntdll.dll!LdrpLoadForwardedDll() Unknown
ntdll.dll!LdrpHandleProtectedDelayload() Unknown
ntdll.dll!LdrResolveDelayLoadedAPI() Unknown
shell32.dll!__delayLoadHelper2() Unknown
shell32.dll!__tailMerge_api_ms_win_shcore_obsolete_l1_1_0_dll() Unknown
ffmpeg.exe!00007ff72d863fb8() Unknown
ffmpeg.exe!00007ff72d865ad2() Unknown
ffmpeg.exe!00007ff72d853cb0() Unknown
ffmpeg.exe!00007ff72d87f5b2() Unknown
ffmpeg.exe!00007ff72d8310c9() Unknown
ffmpeg.exe!00007ff72d831416() Unknown
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown
Turns out ffmpeg calls CommandLineToArgvW and this causes SHCore.dll to be loaded. Do you know why that happens with --enable-librsvg?
Turns out ffmpeg calls
CommandLineToArgvWand this causesSHCore.dllto be loaded. Do you know why that happens with --enable-librsvg?
CommandLineToArgvW is always called, regardless of any dependencies.
That said, I'd avoid depending on the late flush implemented by the CRT. A solution is to add an early flush from the ffmpeg executable:
I don't think this is acceptable, and since it is caused by single library, I don't think it is necessary. As ffmpeg mostly is developed on linux workarounds like that may be hard to upstream.
So it's actually an issue in SHCore.dll. More specifically in WIL
I don't see this issue, when running the ffmpeg.exe -version
It's clang64 static build.
I get single, terminate
[0x0] ntdll!NtTerminateProcess+0x14 0x5be77f8e8 0x7ff86f8918ab
[0x1] KERNEL32!ExitProcessImplementation+0xb 0x5be77f8f0 0x7ff86dc60093
[0x2] ucrtbase!common_exit+0xc7 0x5be77f920 0x7ff7d99d139a
[0x3] ffmpeg_g!WinMainCRTStartup+0x39a 0x5be77f980 0x7ff7d99d13b6
[0x4] ffmpeg_g!mainCRTStartup+0x16 0x5be77fa00 0x7ff86f87e8d7
[0x5] KERNEL32!BaseThreadInitThunk+0x17 0x5be77fa30 0x7ff870a4c53c
[0x6] ntdll!RtlUserThreadStart+0x2c 0x5be77fa60 0x0
After that also terminate for console, but that's not related to the ffmpeg anymore.
[0x0] ntdll!NtTerminateProcess 0x90b4b7fd68 0x7ff870a4c627
[0x1] ntdll!RtlExitUserProcess+0x47 0x90b4b7fd70 0x7ff86f8918ab
[0x2] KERNEL32!ExitProcessImplementation+0xb 0x90b4b7fda0 0x7ff7b4ce56a8
[0x3] conhost!Microsoft::Console::Interactivity::ServiceLocator::RundownAndExit+0x48 0x90b4b7fdd0 0x7ff7b4d72a2a
[0x4] conhost!Microsoft::Console::HostSignalInputThread::_Shutdown+0xe 0x90b4b7fe00 0x7ff7b4d21aa9
[0x5] conhost!Microsoft::Console::HostSignalInputThread::_GetData+0x5d 0x90b4b7fe30 0x7ff7b4d21951
[0x6] conhost!Microsoft::Console::HostSignalInputThread::_InputThread+0x51 0x90b4b7fe70 0x7ff86f87e8d7
[0x7] KERNEL32!BaseThreadInitThunk+0x17 0x90b4b7ff10 0x7ff870a4c53c
[0x8] ntdll!RtlUserThreadStart+0x2c 0x90b4b7ff40 0x0
EDIT: librsvg pulls glib, so not surprising it is problematic.
I can also reproduce it with this minimal example:
// gcc -o test.exe test.c `pkg-config --cflags --libs librsvg-2.0`
#include <librsvg/rsvg.h>
#include <stdio.h>
int main() {
RsvgHandle *handle = rsvg_handle_new();
printf("ffmpeg version 1.2.3");
return 0;
}
#!/usr/bin/env python3
import subprocess
import sys
count = 0
while True:
count += 1
result = subprocess.run(
['./test.exe', '-version'],
capture_output=True
)
if b'ffmpeg version ' not in result.stdout:
print(f"empty at iteration {count}")
print(f"stdout: {result.stdout!r}")
sys.exit(1)
if count % 100 == 0:
print(f"ok: {count} iterations")
@lazka: Thank you for that!
To avoid calling external code, just referencing the symbol from the library, to keep it loaded is enough to reproduce:
// gcc -o test.exe test.c `pkg-config --cflags --libs librsvg-2.0`
#include <librsvg/rsvg.h>
#include <stdio.h>
RsvgHandle *(*fn)(void) = rsvg_handle_new;
int main() {
printf("ffmpeg version 1.2.3");
return 0;
}
librsvg depends on GIO which spawns a worker thread to do some initialization, including loading some libraries. SHCore, on DllMain, calls GetModuleHandle. This tries to acquire the loader lock which is likely orphaned and instant termination occurs.
Mmh wait, actually ExitProcess acquires the loader lock before terminating all other threads. So it seems that GetModuleHandle can acquire another lock? Will keep investigating...
Ah ok, it's LdrpSnapsLock:
ntdll.dll!RtlEnterCriticalSection()
ntdll.dll!LdrpAddUnicodeStringToSnapsBuffer()
ntdll.dll!LdrpLogInternal()
ntdll.dll!LdrGetDllHandle()
KERNELBASE.dll!00007fff76ddfc86()
It's acquired regardless if loader snaps is enabled or not. Of course that lock can be acquired when the loader lock is not held, otheriwse it would be useless. SHCore.dll is acquiring a lock from DllMain (on process exit) and that is forbidden.
What Windows version are you using? I don't see SHCore.dll being loaded, and I don't see unexpected NtTerminateProcess. Does it happen always for you?
I don't see SHCore.dll being loaded
Can you check the strace file after test.sh fails? Here's what I get locally:
$ ./test.sh
ok: 0
(repeat many times)
empty: 0
--- Process 14568 created
--- Process 14568 loaded C:\Windows\System32\ntdll.dll at 00007fff79520000
--- Process 14568 loaded C:\Windows\System32\kernel32.dll at 00007fff78c10000
--- Process 14568 loaded C:\Windows\System32\KernelBase.dll at 00007fff76d90000
(...)
$ cat t.strace | grep -i shcore
--- Process 14568 loaded C:\Windows\System32\SHCore.dll at 00007fff79060000
I get it even on successful runs, but that may depend on timings. SHCore can be delay-loaded and that can happen be done from worker threads.
Does it happen always for you?
No, most of the times ffmpeg exits regularly and the output is correct. However sometimes ffmpeg exits with empty output, and in these cases I get the unexpected NtTerminateProcess call: https://github.com/msys2/MINGW-packages/issues/26646#issuecomment-3599194708
I can reproduce with this sample:
#ifndef UNICODE
#define UNICODE
#define _UNICODE
#endif
#include <windows.h>
#include <stdio.h>
#include <process.h>
static unsigned int __stdcall
thread_main(void* user_data)
{
for (;;)
{
GetModuleHandle(L"ntdll.dll");
}
return 0;
}
int
main(void)
{
for (int i = 0; i < 8; i++)
_beginthreadex(NULL, 0, thread_main, NULL, 0, NULL);
/* By default, stdout is unbuffered when output goes to
* an interactive device. Call setvbuf to make this test
* effective regardless of the output device.
*/
setvbuf(stdout, NULL, _IOFBF, 4096);
printf("hello world!\n");
LoadLibraryEx(L"shcore.dll", NULL, LOAD_LIBRARY_SEARCH_SYSTEM32);
}
Run in a loop from bash:
while [ 1 ]
do
echo "Launching"
./test-app.exe
done
Can you check the strace file after test.sh fails? Here's what I get locally:
There is no shcore.dll. I understand that it is loaded at some point, because I see it loaded if I add a sleep before printf, but then it doesn't reproduce anymore. So while it may be one of the triggers for this issue, it doesn't seems to be the only one and generally it is full of race conditions at the start of the program. Adding sleep, presumably allowing glib thread to finish it's initialization, makes the issue go away, at least on my end.
I can reproduce with this sample:
Yes, I can see this issue too, all I'm saying that shcore.dll doesn't seem to be the only cause of the original one.
Generally adding a atexit(fflush_streams); is probably pragmatic thing to do, given that CRT fflush can be not executed if other libraries misbehave. Though I'm not very found of this solution.
Also for MSVC, linking CRT statically avoids the issue altogether. Basically flush happens when unloading our module in this case.
EDIT: glib itself also workaround this issue in their test application https://gitlab.gnome.org/GNOME/glib/-/blob/135733147629b27504f0086d783f15a89dd63c9d/build-aux/win32/app.c#L107-134
I've created #26709 to work around this in ffmpeg for now.
I have now opened:
- https://github.com/microsoft/wil/issues/567
- https://github.com/mingw-w64/mingw-w64/pull/141