libpalaso icon indicating copy to clipboard operation
libpalaso copied to clipboard

GlobalMutex.LinuxGlobalMutexAdapter fails to handle EINTR

Open rmunn opened this issue 9 years ago • 3 comments

While debugging a failing unit test in the LfMerge project, I got the following exception:

SIL.PlatformUtilities.NativeException : An error with the number, 4, ocurred.
  at SIL.Threading.GlobalMutex+LinuxGlobalMutexAdapter.Wait () [0x0002d] in /var/lib/TeamCity/agent/work/60b6a3b495b7759c/SIL.Core/Threading/GlobalMutex.cs:208 
  at SIL.Threading.GlobalMutex+LinuxGlobalMutexAdapter.Init (Boolean initiallyOwned) [0x0007a] in /var/lib/TeamCity/agent/work/60b6a3b495b7759c/SIL.Core/Threading/GlobalMutex.cs:198 
  at SIL.Threading.GlobalMutex.InitializeAndLock (System.Boolean& createdNew) [0x00006] in /var/lib/TeamCity/agent/work/60b6a3b495b7759c/SIL.Core/Threading/GlobalMutex.cs:98 
  at SIL.FieldWorks.FDO.Infrastructure.Impl.SharedXMLBackendProvider.StartupInternal (Int32 currentModelVersion) [0x00000] in <filename unknown>:0 
  at SIL.FieldWorks.FDO.Infrastructure.Impl.FDOBackendProvider.StartupInternalWithDataMigrationIfNeeded (IThreadedProgress progressDlg) [0x00000] in <filename unknown>:0 
  at SIL.FieldWorks.FDO.Infrastructure.Impl.FDOBackendProvider.StartupExtantLanguageProject (IProjectIdentifier projectId, Boolean fBootstrapSystem, IThreadedProgress progressDlg) [0x00000] in <filename unknown>:0

The error number in SIL.PlatformUtilities.NativeException comes from Marshal.GetLastWin32Error(), which on Linux returns the latest value of errno, the Unix C library's all-purpose error number. http://www.virtsync.com/c-error-codes-include-errno lists errno code 4 as EINTR, "Interrupted system call". According to this SO question, this blog post, and this libc manual entry, the right thing to do when EINTR is received is usually to restart the interrupted system call. (If EINTR was received because the user hit Ctrl-C or ran kill (your process ID), then your code should already be handling that signal elsewhere and shutting down the program.)

In this case, that's certainly the right approach. The LinuxGlobalMutexAdapter needs to check for EINTR and handle it by retrying the appropriate system call, up to small number of times (say, 5). I don't currently have time to work on a patch for this, but I will have some time next month if nobody gets to this issue before then.

rmunn avatar Feb 22 '16 06:02 rmunn

@rmunn Is this still something you could patch?

imnasnainaec avatar Feb 13 '25 15:02 imnasnainaec

@rmunn Is this still something you could patch?

Might be able to fix this at the same time that I fix #1428, since I'll probably be touching a lot of the same code.

rmunn avatar Apr 28 '25 07:04 rmunn

I would suggest setting SIL_CORE_MAKE_GLOBAL_MUTEX_LOCAL_ONLY to true, you shouldn't get the error anymore. It was created specifically for this issue: https://github.com/sillsdev/libpalaso/blob/master/SIL.Core/Threading/GlobalMutex.cs#L181 done in #1378 That said it might not be the right workaround for this issue, hard to say. Matt made the change for snap packages.

hahn-kev avatar Apr 29 '25 02:04 hahn-kev