pmdk
pmdk copied to clipboard
obj_msync_nofail fails when msync is interrupted by signal
ISSUE: obj_msync_nofail fails when msync is interrupted by a signal
Environment Information
- PMDK package version(s): 1.5, 1.6, 1.7
Please provide a reproduction of the bug:
The crash happens when pmem_msync inside obj_msync_nofail fails with EINTR. This happens under some circumstances on ecryptfs (when application is aborted after persist, following pool open fails).
The crash will probably happen also in case any signal interrupts msync.
Actual behavior:
Opening pool fails with:
<libpmemobj>: <1> [obj.c:1006 obj_msync_nofail] pmem_msync: Interrupted system call
Expected behavior:
Opening pool suceeds
Details
obj_msync_nofail should probably retry if pmem_msync returned EINTR. Changing implementation of obj_msync_nofail to following, fixes the issue:
static void obj_msync_nofail(const void *addr, size_t size)
{
while(1) {
int ret = pmem_msync(addr, size);
if (!ret) return;
if (errno == EINTR)
continue;
FATAL("!pmem_msync");
}
}
Other problems
- Some functions (for example obj_nopmem_memset) use pmem_msync withouth checking return value. Those occurences should be changed to call to obj_msync_nofail.
- os_part_deep_common uses pmem_msync and returns -1 for an error. Maybe here, we should also retry if pmem_msync failed with EINTR (user will not know that failure is because msync was interrupted by signal and operation should be retried).
Not entirely sure where the EINTR comes from, as the manpage doesn't list EINTR as a valid return from msync(), and neither vfs, ext4 nor ecryptfs code uses it unless fatal_signal_pending(). But I haven't looked deep enough.
As for your fix: I say we should change pmem_msync() only instead of obj_msync_nofail().
One concern: what if the process is under signal spam from some source? msync can be a long operation (eg. on nfs), and I'm not sure it is guaranteed to progress before getting interrupted.
As for ecryptfs this can be a bug in the filesystem, because although msync returns EINTR there is no signal showed under strace. As for the manpage, I found following explanation: https://unix.stackexchange.com/questions/468064/can-a-system-call-with-non-documented-eintr-return-eintr
Let's add an ENV variable to control this behavior, and by default let's repeat 100 times on EINTR.
This improvement is not considered vital at the moment. So, we do not have the resources to fulfil your request. Sorry.