glusterfs
glusterfs copied to clipboard
Introduce '--fuse-setlk-handle-interrupt' option
TL;DR this is a workaround to certain issues with locking.
In detail:
A number of semantic issues came up in relation to locking, due to which the jolly optimistic approach to lock interruption -- "well, we just make use of whatever cancellation primitives the locking API has to offer" -- is not always a viable choice.
Solidifying the lock subsystem is an ongoing effort; however, until this reaches completion, we need a workaround, which, in this case, boils down to allowing users (mounters) to disable interruptability of locks via a gliusterfs command line / mount option.
The issues that most obviously interfere with lock interruption are: #3177, #3179. (Note that by now the former one is resolved and closed; I'm mentioning it now as the referred misbehavior might still be present on older Glusterfs versions where this commit might get backported.)
Update: #3179 Change-Id: I28568d6182b2e716b36995ee28b26726af633942 Signed-off-by: Csaba Henk [email protected]
CLANG-FORMAT FAILURE: Before merging the patch, this diff needs to be considered for passing clang-format
index 3cade9a65..5c2567c89 100644
--- a/libglusterfs/src/glusterfs/glusterfs.h
+++ b/libglusterfs/src/glusterfs/glusterfs.h
@@ -63,7 +63,6 @@
#define O_PATH 010000000 /* from asm-generic/fcntl.h */
#endif
-
#ifndef EBADFD
/* Mac OS X does not have EBADFD */
#define EBADFD EBADF
/run regression
1 test(s) failed ./tests/basic/distribute/spare_file_rebalance.t
0 test(s) generated core
5 test(s) needed retry ./tests/000-flaky/basic_distribute_rebal-all-nodes-migrate.t ./tests/000-flaky/bugs_distribute_bug-1117851.t ./tests/000-flaky/bugs_glusterd_bug-857330/normal.t ./tests/000-flaky/bugs_glusterd_bug-857330/xml.t ./tests/basic/distribute/spare_file_rebalance.t
4 flaky test(s) marked as success even though they failed ./tests/000-flaky/basic_distribute_rebal-all-nodes-migrate.t ./tests/000-flaky/bugs_distribute_bug-1117851.t ./tests/000-flaky/bugs_glusterd_bug-857330/normal.t ./tests/000-flaky/bugs_glusterd_bug-857330/xml.t https://build.gluster.org/job/gh_centos7-regression/2789/
/run regression
/run regression
1 test(s) failed ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
0 test(s) generated core
3 test(s) needed retry ./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t ./tests/000-flaky/basic_mount-nfs-auth.t ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t https://build.gluster.org/job/gh_centos7-regression/2808/
@csabahenk while looking at another PR (#3832) I've realized that we have a design issue in locks xlator that makes interrupts hard to work with AFR/EC.
Currently AFR takes posix locks sequentially. Once it's acquired in one brick, it tries the next one. This raises a problem when the first brick succeeds but there's a failure in the second one (caused by interrupt or anything else). We have two options:
- Mark the brick as bad and continue with the other bricks (this is what AFR currently does).
- Revoke the already granted lock in the first brick and return the error.
For interrupts, first option cannot be used because EINTR is not a regular error. The user wants to cancel the whole lock, so we can't proceed getting a lock on other bricks.
However, second option doesn't work correctly. Locks xlator implements posix locks as they are defined by the posix standard. This means that granted locks are "combined" with other existing granted locks. Unfortunately, in some cases this makes it impossible to revoke an already granted lock without unwanted side effects.
For example consider a process that takes two read locks on the same file, but their ranges overlap a bit. Once both of them are granted in one brick, revoking any of them because of an interrupt will cause the overlapped range to be unlocked, even if one of the locks is still granted (from the point of view of the user, one of the locks has never been granted because it returned EINTR, so the user still expects that the other lock still covers the overlapping region).
So for interrupts there's no way to correctly handle them in AFR/EC in some cases.
Shouldn't we set the default value for --fuse-setlk-handle-interrupt
to off
?
@csabahenk Can you please refresh the patch?
CLANG-FORMAT FAILURE: Before merging the patch, this diff needs to be considered for passing clang-format
index 9cb308f33..93b6615ac 100644
--- a/libglusterfs/src/glusterfs/glusterfs.h
+++ b/libglusterfs/src/glusterfs/glusterfs.h
@@ -63,7 +63,6 @@
#define O_PATH 010000000 /* from asm-generic/fcntl.h */
#endif
-
#ifndef EBADFD
/* Mac OS X does not have EBADFD */
#define EBADFD EBADF
diff --git a/xlators/mount/fuse/src/fuse-bridge.c b/xlators/mount/fuse/src/fuse-bridge.c
index bed6c3f98..403c1f1d3 100644
--- a/xlators/mount/fuse/src/fuse-bridge.c
+++ b/xlators/mount/fuse/src/fuse-bridge.c
@@ -6348,7 +6348,8 @@ fuse_priv_dump(xlator_t *this)
if (!this)
return -1;
- private = this->private;
+ private
+ = this->private;
if (!private)
return -1;
@@ -6497,7 +6498,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
glusterfs_graph_t *graph = NULL;
struct pollfd pfd = {0};
- private = this->private;
+ private
+ = this->private;
graph = data;
@@ -6522,7 +6524,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
(event == GF_EVENT_CHILD_DOWN)) {
pthread_mutex_lock(&private->sync_mutex);
{
- private->event_recvd = 1;
+ private
+ ->event_recvd = 1;
pthread_cond_broadcast(&private->sync_cond);
}
pthread_mutex_unlock(&private->sync_mutex);
@@ -6531,16 +6534,18 @@ notify(xlator_t *this, int32_t event, void *data, ...)
pthread_mutex_lock(&private->sync_mutex);
{
if (!private->fuse_thread_started) {
- private->fuse_thread_started = 1;
+ private
+ ->fuse_thread_started = 1;
start_thread = _gf_true;
}
}
pthread_mutex_unlock(&private->sync_mutex);
if (start_thread) {
- private->fuse_thread = GF_CALLOC(private->reader_thread_count,
- sizeof(pthread_t),
- gf_fuse_mt_pthread_t);
+ private
+ ->fuse_thread = GF_CALLOC(private->reader_thread_count,
+ sizeof(pthread_t),
+ gf_fuse_mt_pthread_t);
for (i = 0; i < private->reader_thread_count; i++) {
ret = gf_thread_create(&private->fuse_thread[i], NULL,
fuse_thread_proc, this, "fuseproc");
@@ -6574,7 +6579,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
if (fuse_get_mount_status(this) != 0) {
goto auth_fail_unlock;
}
- private->mount_finished = _gf_true;
+ private
+ ->mount_finished = _gf_true;
} else if (pfd.revents) {
gf_log(this->name, GF_LOG_ERROR,
"mount pipe closed without status");
/run regression
1 test(s) failed ./tests/features/flock_interrupt.t
0 test(s) generated core
4 test(s) needed retry ./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t ./tests/000-flaky/glusterd-restart-shd-mux.t ./tests/bugs/distribute/bug-1099890.t ./tests/features/flock_interrupt.t
1 flaky test(s) marked as success even though they failed ./tests/000-flaky/glusterd-restart-shd-mux.t https://build.gluster.org/job/gh_centos7-regression/3062/
CLANG-FORMAT FAILURE: Before merging the patch, this diff needs to be considered for passing clang-format
index 9cb308f33..93b6615ac 100644
--- a/libglusterfs/src/glusterfs/glusterfs.h
+++ b/libglusterfs/src/glusterfs/glusterfs.h
@@ -63,7 +63,6 @@
#define O_PATH 010000000 /* from asm-generic/fcntl.h */
#endif
-
#ifndef EBADFD
/* Mac OS X does not have EBADFD */
#define EBADFD EBADF
diff --git a/xlators/mount/fuse/src/fuse-bridge.c b/xlators/mount/fuse/src/fuse-bridge.c
index bed6c3f98..403c1f1d3 100644
--- a/xlators/mount/fuse/src/fuse-bridge.c
+++ b/xlators/mount/fuse/src/fuse-bridge.c
@@ -6348,7 +6348,8 @@ fuse_priv_dump(xlator_t *this)
if (!this)
return -1;
- private = this->private;
+ private
+ = this->private;
if (!private)
return -1;
@@ -6497,7 +6498,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
glusterfs_graph_t *graph = NULL;
struct pollfd pfd = {0};
- private = this->private;
+ private
+ = this->private;
graph = data;
@@ -6522,7 +6524,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
(event == GF_EVENT_CHILD_DOWN)) {
pthread_mutex_lock(&private->sync_mutex);
{
- private->event_recvd = 1;
+ private
+ ->event_recvd = 1;
pthread_cond_broadcast(&private->sync_cond);
}
pthread_mutex_unlock(&private->sync_mutex);
@@ -6531,16 +6534,18 @@ notify(xlator_t *this, int32_t event, void *data, ...)
pthread_mutex_lock(&private->sync_mutex);
{
if (!private->fuse_thread_started) {
- private->fuse_thread_started = 1;
+ private
+ ->fuse_thread_started = 1;
start_thread = _gf_true;
}
}
pthread_mutex_unlock(&private->sync_mutex);
if (start_thread) {
- private->fuse_thread = GF_CALLOC(private->reader_thread_count,
- sizeof(pthread_t),
- gf_fuse_mt_pthread_t);
+ private
+ ->fuse_thread = GF_CALLOC(private->reader_thread_count,
+ sizeof(pthread_t),
+ gf_fuse_mt_pthread_t);
for (i = 0; i < private->reader_thread_count; i++) {
ret = gf_thread_create(&private->fuse_thread[i], NULL,
fuse_thread_proc, this, "fuseproc");
@@ -6574,7 +6579,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
if (fuse_get_mount_status(this) != 0) {
goto auth_fail_unlock;
}
- private->mount_finished = _gf_true;
+ private
+ ->mount_finished = _gf_true;
} else if (pfd.revents) {
gf_log(this->name, GF_LOG_ERROR,
"mount pipe closed without status");
/run regression
@xhernandez Do you have any concerns on the patch or can I merge it.