GlobalMutex.LinuxGlobalMutexAdapter fails to handle EINTR
While debugging a failing unit test in the LfMerge project, I got the following exception:
SIL.PlatformUtilities.NativeException : An error with the number, 4, ocurred.
at SIL.Threading.GlobalMutex+LinuxGlobalMutexAdapter.Wait () [0x0002d] in /var/lib/TeamCity/agent/work/60b6a3b495b7759c/SIL.Core/Threading/GlobalMutex.cs:208
at SIL.Threading.GlobalMutex+LinuxGlobalMutexAdapter.Init (Boolean initiallyOwned) [0x0007a] in /var/lib/TeamCity/agent/work/60b6a3b495b7759c/SIL.Core/Threading/GlobalMutex.cs:198
at SIL.Threading.GlobalMutex.InitializeAndLock (System.Boolean& createdNew) [0x00006] in /var/lib/TeamCity/agent/work/60b6a3b495b7759c/SIL.Core/Threading/GlobalMutex.cs:98
at SIL.FieldWorks.FDO.Infrastructure.Impl.SharedXMLBackendProvider.StartupInternal (Int32 currentModelVersion) [0x00000] in <filename unknown>:0
at SIL.FieldWorks.FDO.Infrastructure.Impl.FDOBackendProvider.StartupInternalWithDataMigrationIfNeeded (IThreadedProgress progressDlg) [0x00000] in <filename unknown>:0
at SIL.FieldWorks.FDO.Infrastructure.Impl.FDOBackendProvider.StartupExtantLanguageProject (IProjectIdentifier projectId, Boolean fBootstrapSystem, IThreadedProgress progressDlg) [0x00000] in <filename unknown>:0
The error number in SIL.PlatformUtilities.NativeException comes from Marshal.GetLastWin32Error(), which on Linux returns the latest value of errno, the Unix C library's all-purpose error number. http://www.virtsync.com/c-error-codes-include-errno lists errno code 4 as EINTR, "Interrupted system call". According to this SO question, this blog post, and this libc manual entry, the right thing to do when EINTR is received is usually to restart the interrupted system call. (If EINTR was received because the user hit Ctrl-C or ran kill (your process ID), then your code should already be handling that signal elsewhere and shutting down the program.)
In this case, that's certainly the right approach. The LinuxGlobalMutexAdapter needs to check for EINTR and handle it by retrying the appropriate system call, up to small number of times (say, 5). I don't currently have time to work on a patch for this, but I will have some time next month if nobody gets to this issue before then.
@rmunn Is this still something you could patch?
@rmunn Is this still something you could patch?
Might be able to fix this at the same time that I fix #1428, since I'll probably be touching a lot of the same code.
I would suggest setting SIL_CORE_MAKE_GLOBAL_MUTEX_LOCAL_ONLY to true, you shouldn't get the error anymore. It was created specifically for this issue: https://github.com/sillsdev/libpalaso/blob/master/SIL.Core/Threading/GlobalMutex.cs#L181 done in #1378
That said it might not be the right workaround for this issue, hard to say. Matt made the change for snap packages.