glusterfs
glusterfs copied to clipboard
[events-api] Write cfg file with exclusive locks
to prevent race cases that might occur when several nodes are trying to create the config file at the same time if it is non existent (first run).
Fixes: #3714 Updates: #3715
Signed-off-by: black-dragon74 [email protected]
What's the benefit of this approach compared to creating a temporary file and then renaming it ?
The issue was due to truncation of the contents of the file. By serializing the writes we can ensure that other processes will write to the file only after the previous one is done with it. Creating a temp file and then renaming is not beneficial here as the file name, hence the file being written to remains the same.
all contending writers will, sooner or later, update the file.
Correct. But in this way of doing things we ensure there are no overlaps and hence no truncation or corruption of the data. Updating the file n
number of times is not an issue, writing to it in parallel is.
Isn't this approach slower ?
As we are using POSIX locks, yes this is a tad bit slower but then again we are not waiting on any external sources for the data to be written and are not writing to a temp file and then renaming it (saving one additional system call), the performance should not take a major hit.
is there any other benefit I missed ?
None other than the base one that I can think of.
Regards
What's the benefit of this approach compared to creating a temporary file and then renaming it ?
The issue was due to truncation of the contents of the file. By serializing the writes we can ensure that other processes will write to the file only after the previous one is done with it. Creating a temp file and then renaming is not beneficial here as the file name, hence the file being written to remains the same.
I don't understand this. If a temporary file name is created and then renamed to the final name, corruption is not possible. Obviously it will overwrite any previous existing file, but this also happens with the LockedOpen approach.
all contending writers will, sooner or later, update the file.
Correct. But in this way of doing things we ensure there are no overlaps and hence no truncation or corruption of the data. Updating the file
n
number of times is not an issue, writing to it in parallel is.
There are no overlaps. Each writer will write to a different file, so even if they write in parallel, it's not a problem. Only when writes are finished, it will rename that file to the final name. Renames are atomic, so there's no risk of corruption.
Isn't this approach slower ?
As we are using POSIX locks, yes this is a tad bit slower but then again we are not waiting on any external sources for the data to be written and are not writing to a temp file and then renaming it (saving one additional system call), the performance should not take a major hit.
In what external source may you be waiting with the open + rename approach ?
Also, you are not saving any system call. 1 open + 1 close + 1 rename are only 3 system calls. LockedOpen hides at least 2 opens + 2 closes + 1 lock (1 open, 1 close and 1 lock are inside a loop that could be executed an unbounded number of times).
So I don't see any clear benefit of the LockedOpen approach. It requires more system calls, it adds more complexity (even if it's hidden inside a class), and the final result is exactly the same. In fact I still see the open + rename approach simpler and faster.
The only thing that LockedOpen has that I think is a real benefit is that it encapsulates the logic into a reusable class. We could do the same with open + rename to make it easier to use everywhere.
I think this should now be good to go.
/cc: @xhernandez
Regards
/recheck smoke
/run regression
1 test(s) failed ./tests/00-geo-rep/00-georep-verify-non-root-setup.t
0 test(s) generated core
2 test(s) needed retry ./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t ./tests/00-geo-rep/00-georep-verify-non-root-setup.t https://build.gluster.org/job/gh_centos7-regression/2973/
@black-dragon74 this patch was accepted and only needed to pass the regression. I think this PR can be useful.
@black-dragon74 this patch was accepted and only needed to pass the regression. I think this PR can be useful.
Sorry, was doing some cleanup on my fork and this patch missed my eyes. Will open a new one.
Regards
/run regression
1 test(s) failed ./tests/00-geo-rep/00-georep-verify-non-root-setup.t
0 test(s) generated core
3 test(s) needed retry ./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t ./tests/000-flaky/basic_ec_ec-quorum-count-partial-failure.t ./tests/00-geo-rep/00-georep-verify-non-root-setup.t
1 flaky test(s) marked as success even though they failed ./tests/000-flaky/basic_ec_ec-quorum-count-partial-failure.t https://build.gluster.org/job/gh_centos7-regression/3089/
/run regression
1 test(s) failed ./tests/00-geo-rep/00-georep-verify-non-root-setup.t
0 test(s) generated core
4 test(s) needed retry ./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t ./tests/000-flaky/features_copy-file-range.t ./tests/000-flaky/glusterd-restart-shd-mux.t ./tests/00-geo-rep/00-georep-verify-non-root-setup.t
1 flaky test(s) marked as success even though they failed ./tests/000-flaky/features_copy-file-range.t https://build.gluster.org/job/gh_centos7-regression/3182/
/run regression
1 test(s) failed ./tests/00-geo-rep/00-georep-verify-non-root-setup.t
0 test(s) generated core
4 test(s) needed retry ./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t ./tests/000-flaky/features_copy-file-range.t ./tests/000-flaky/glusterd-restart-shd-mux.t ./tests/00-geo-rep/00-georep-verify-non-root-setup.t
1 flaky test(s) marked as success even though they failed ./tests/000-flaky/features_copy-file-range.t https://build.gluster.org/job/gh_centos7-regression/3195/
/run regression
This PR is now tracked at: https://github.com/gluster/glusterfs/pull/4079 and is archived to preserve discussions.
Regards
!!! Couldn't read commit file !!!