glusterfs icon indicating copy to clipboard operation
glusterfs copied to clipboard

Introduce '--fuse-setlk-handle-interrupt' option

Open csabahenk opened this issue 2 years ago • 6 comments

TL;DR this is a workaround to certain issues with locking.

In detail:

A number of semantic issues came up in relation to locking, due to which the jolly optimistic approach to lock interruption -- "well, we just make use of whatever cancellation primitives the locking API has to offer" -- is not always a viable choice.

Solidifying the lock subsystem is an ongoing effort; however, until this reaches completion, we need a workaround, which, in this case, boils down to allowing users (mounters) to disable interruptability of locks via a gliusterfs command line / mount option.

The issues that most obviously interfere with lock interruption are: #3177, #3179. (Note that by now the former one is resolved and closed; I'm mentioning it now as the referred misbehavior might still be present on older Glusterfs versions where this commit might get backported.)

Update: #3179 Change-Id: I28568d6182b2e716b36995ee28b26726af633942 Signed-off-by: Csaba Henk [email protected]

csabahenk avatar Sep 01 '22 09:09 csabahenk

CLANG-FORMAT FAILURE: Before merging the patch, this diff needs to be considered for passing clang-format

index 3cade9a65..5c2567c89 100644
--- a/libglusterfs/src/glusterfs/glusterfs.h
+++ b/libglusterfs/src/glusterfs/glusterfs.h
@@ -63,7 +63,6 @@
 #define O_PATH 010000000 /* from asm-generic/fcntl.h */
 #endif
 
-
 #ifndef EBADFD
 /* Mac OS X does not have EBADFD */
 #define EBADFD EBADF

gluster-ant avatar Sep 01 '22 09:09 gluster-ant

/run regression

csabahenk avatar Sep 01 '22 12:09 csabahenk

1 test(s) failed ./tests/basic/distribute/spare_file_rebalance.t

0 test(s) generated core

5 test(s) needed retry ./tests/000-flaky/basic_distribute_rebal-all-nodes-migrate.t ./tests/000-flaky/bugs_distribute_bug-1117851.t ./tests/000-flaky/bugs_glusterd_bug-857330/normal.t ./tests/000-flaky/bugs_glusterd_bug-857330/xml.t ./tests/basic/distribute/spare_file_rebalance.t

4 flaky test(s) marked as success even though they failed ./tests/000-flaky/basic_distribute_rebal-all-nodes-migrate.t ./tests/000-flaky/bugs_distribute_bug-1117851.t ./tests/000-flaky/bugs_glusterd_bug-857330/normal.t ./tests/000-flaky/bugs_glusterd_bug-857330/xml.t https://build.gluster.org/job/gh_centos7-regression/2789/

gluster-ant avatar Sep 02 '22 06:09 gluster-ant

/run regression

csabahenk avatar Sep 02 '22 23:09 csabahenk

/run regression

mohit84 avatar Sep 06 '22 06:09 mohit84

1 test(s) failed ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t

0 test(s) generated core

3 test(s) needed retry ./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t ./tests/000-flaky/basic_mount-nfs-auth.t ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t https://build.gluster.org/job/gh_centos7-regression/2808/

gluster-ant avatar Sep 06 '22 13:09 gluster-ant

@csabahenk while looking at another PR (#3832) I've realized that we have a design issue in locks xlator that makes interrupts hard to work with AFR/EC.

Currently AFR takes posix locks sequentially. Once it's acquired in one brick, it tries the next one. This raises a problem when the first brick succeeds but there's a failure in the second one (caused by interrupt or anything else). We have two options:

  1. Mark the brick as bad and continue with the other bricks (this is what AFR currently does).
  2. Revoke the already granted lock in the first brick and return the error.

For interrupts, first option cannot be used because EINTR is not a regular error. The user wants to cancel the whole lock, so we can't proceed getting a lock on other bricks.

However, second option doesn't work correctly. Locks xlator implements posix locks as they are defined by the posix standard. This means that granted locks are "combined" with other existing granted locks. Unfortunately, in some cases this makes it impossible to revoke an already granted lock without unwanted side effects.

For example consider a process that takes two read locks on the same file, but their ranges overlap a bit. Once both of them are granted in one brick, revoking any of them because of an interrupt will cause the overlapped range to be unlocked, even if one of the locks is still granted (from the point of view of the user, one of the locks has never been granted because it returned EINTR, so the user still expects that the other lock still covers the overlapping region).

So for interrupts there's no way to correctly handle them in AFR/EC in some cases.

Shouldn't we set the default value for --fuse-setlk-handle-interrupt to off ?

xhernandez avatar Sep 29 '22 09:09 xhernandez

@csabahenk Can you please refresh the patch?

mohit84 avatar Nov 29 '22 04:11 mohit84

CLANG-FORMAT FAILURE: Before merging the patch, this diff needs to be considered for passing clang-format

index 9cb308f33..93b6615ac 100644
--- a/libglusterfs/src/glusterfs/glusterfs.h
+++ b/libglusterfs/src/glusterfs/glusterfs.h
@@ -63,7 +63,6 @@
 #define O_PATH 010000000 /* from asm-generic/fcntl.h */
 #endif
 
-
 #ifndef EBADFD
 /* Mac OS X does not have EBADFD */
 #define EBADFD EBADF
diff --git a/xlators/mount/fuse/src/fuse-bridge.c b/xlators/mount/fuse/src/fuse-bridge.c
index bed6c3f98..403c1f1d3 100644
--- a/xlators/mount/fuse/src/fuse-bridge.c
+++ b/xlators/mount/fuse/src/fuse-bridge.c
@@ -6348,7 +6348,8 @@ fuse_priv_dump(xlator_t *this)
     if (!this)
         return -1;
 
-    private = this->private;
+   private
+    = this->private;
 
     if (!private)
         return -1;
@@ -6497,7 +6498,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
     glusterfs_graph_t *graph = NULL;
     struct pollfd pfd = {0};
 
-    private = this->private;
+   private
+    = this->private;
 
     graph = data;
 
@@ -6522,7 +6524,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
                 (event == GF_EVENT_CHILD_DOWN)) {
                 pthread_mutex_lock(&private->sync_mutex);
                 {
-                    private->event_recvd = 1;
+                   private
+                    ->event_recvd = 1;
                     pthread_cond_broadcast(&private->sync_cond);
                 }
                 pthread_mutex_unlock(&private->sync_mutex);
@@ -6531,16 +6534,18 @@ notify(xlator_t *this, int32_t event, void *data, ...)
             pthread_mutex_lock(&private->sync_mutex);
             {
                 if (!private->fuse_thread_started) {
-                    private->fuse_thread_started = 1;
+                   private
+                    ->fuse_thread_started = 1;
                     start_thread = _gf_true;
                 }
             }
             pthread_mutex_unlock(&private->sync_mutex);
 
             if (start_thread) {
-                private->fuse_thread = GF_CALLOC(private->reader_thread_count,
-                                                 sizeof(pthread_t),
-                                                 gf_fuse_mt_pthread_t);
+               private
+                ->fuse_thread = GF_CALLOC(private->reader_thread_count,
+                                          sizeof(pthread_t),
+                                          gf_fuse_mt_pthread_t);
                 for (i = 0; i < private->reader_thread_count; i++) {
                     ret = gf_thread_create(&private->fuse_thread[i], NULL,
                                            fuse_thread_proc, this, "fuseproc");
@@ -6574,7 +6579,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
                         if (fuse_get_mount_status(this) != 0) {
                             goto auth_fail_unlock;
                         }
-                        private->mount_finished = _gf_true;
+                       private
+                        ->mount_finished = _gf_true;
                     } else if (pfd.revents) {
                         gf_log(this->name, GF_LOG_ERROR,
                                "mount pipe closed without status");

gluster-ant avatar Dec 07 '22 23:12 gluster-ant

/run regression

mohit84 avatar Dec 08 '22 00:12 mohit84

1 test(s) failed ./tests/features/flock_interrupt.t

0 test(s) generated core

4 test(s) needed retry ./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t ./tests/000-flaky/glusterd-restart-shd-mux.t ./tests/bugs/distribute/bug-1099890.t ./tests/features/flock_interrupt.t

1 flaky test(s) marked as success even though they failed ./tests/000-flaky/glusterd-restart-shd-mux.t https://build.gluster.org/job/gh_centos7-regression/3062/

gluster-ant avatar Dec 08 '22 04:12 gluster-ant

CLANG-FORMAT FAILURE: Before merging the patch, this diff needs to be considered for passing clang-format

index 9cb308f33..93b6615ac 100644
--- a/libglusterfs/src/glusterfs/glusterfs.h
+++ b/libglusterfs/src/glusterfs/glusterfs.h
@@ -63,7 +63,6 @@
 #define O_PATH 010000000 /* from asm-generic/fcntl.h */
 #endif
 
-
 #ifndef EBADFD
 /* Mac OS X does not have EBADFD */
 #define EBADFD EBADF
diff --git a/xlators/mount/fuse/src/fuse-bridge.c b/xlators/mount/fuse/src/fuse-bridge.c
index bed6c3f98..403c1f1d3 100644
--- a/xlators/mount/fuse/src/fuse-bridge.c
+++ b/xlators/mount/fuse/src/fuse-bridge.c
@@ -6348,7 +6348,8 @@ fuse_priv_dump(xlator_t *this)
     if (!this)
         return -1;
 
-    private = this->private;
+   private
+    = this->private;
 
     if (!private)
         return -1;
@@ -6497,7 +6498,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
     glusterfs_graph_t *graph = NULL;
     struct pollfd pfd = {0};
 
-    private = this->private;
+   private
+    = this->private;
 
     graph = data;
 
@@ -6522,7 +6524,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
                 (event == GF_EVENT_CHILD_DOWN)) {
                 pthread_mutex_lock(&private->sync_mutex);
                 {
-                    private->event_recvd = 1;
+                   private
+                    ->event_recvd = 1;
                     pthread_cond_broadcast(&private->sync_cond);
                 }
                 pthread_mutex_unlock(&private->sync_mutex);
@@ -6531,16 +6534,18 @@ notify(xlator_t *this, int32_t event, void *data, ...)
             pthread_mutex_lock(&private->sync_mutex);
             {
                 if (!private->fuse_thread_started) {
-                    private->fuse_thread_started = 1;
+                   private
+                    ->fuse_thread_started = 1;
                     start_thread = _gf_true;
                 }
             }
             pthread_mutex_unlock(&private->sync_mutex);
 
             if (start_thread) {
-                private->fuse_thread = GF_CALLOC(private->reader_thread_count,
-                                                 sizeof(pthread_t),
-                                                 gf_fuse_mt_pthread_t);
+               private
+                ->fuse_thread = GF_CALLOC(private->reader_thread_count,
+                                          sizeof(pthread_t),
+                                          gf_fuse_mt_pthread_t);
                 for (i = 0; i < private->reader_thread_count; i++) {
                     ret = gf_thread_create(&private->fuse_thread[i], NULL,
                                            fuse_thread_proc, this, "fuseproc");
@@ -6574,7 +6579,8 @@ notify(xlator_t *this, int32_t event, void *data, ...)
                         if (fuse_get_mount_status(this) != 0) {
                             goto auth_fail_unlock;
                         }
-                        private->mount_finished = _gf_true;
+                       private
+                        ->mount_finished = _gf_true;
                     } else if (pfd.revents) {
                         gf_log(this->name, GF_LOG_ERROR,
                                "mount pipe closed without status");

gluster-ant avatar Dec 08 '22 18:12 gluster-ant

/run regression

mohit84 avatar Dec 23 '22 06:12 mohit84

@xhernandez Do you have any concerns on the patch or can I merge it.

mohit84 avatar Jan 06 '23 00:01 mohit84