sbd icon indicating copy to clipboard operation
sbd copied to clipboard

Feature: sbd-inquisitor,sbd-md: execute commands with sub-processes for respective devices

Open gao-yan opened this issue 2 years ago • 3 comments

This is an universal solution to prevent fencing from hanging on silently blocked devices as originally brought up from https://github.com/ClusterLabs/sbd/pull/119.

gao-yan avatar Aug 01 '22 10:08 gao-yan

These are some points I can think of so far:

  • For now it's only done for dump. I'm planning to do this also for list and message commands which are comonly use in the sbd fence agents.

  • If some of the "sub-process" code looks familiar to you, indeed it's mainly borrowed from pacemaker/lib/services/services_linux.c :-)

  • I didn't choose signalfd mechanism for the synchronous way, since it doesn't seem universal for all operating systems. OTOH the "self-pipe implementation" is universal and seems to work well. I wouldn't tend to bloat the implementation too much.

  • Handling of error codes including logging of errors in the "sub-process" code still keeps pacemaker's conventions to be as user-friendly/clear/consistent as possible. It doesn't require anything more than libcrmcommon anyway.

  • For now both synchronous and asynchronous solutions are included. I've been thinking there might be use cases where one of the solutions could be more suitable than the other. And synchronous way is the default for now. Of course that can be changed as well.

  • Async IO timeout (-I) is used for the timeout of list servant. I'm planning to use it also also for dump servant, but msgwait as the timeout for message servant. I wouldn't prefer to introduce separate timeouts for the purpose to confuse users even more :-)

  • I've only done some basic test. I haven't I tried it with a silently blocked device yet. It's supposed to work theoretically :) Hopefully certain regression tests for this can be somehow figured out and added.

  • There's a static mainloop variable in sbd-pacemaker.c. Not yet, but probably this extern one could be reused in there as well.

Any ideas/suggestions are appreciated. And no hurry.

gao-yan avatar Aug 01 '22 10:08 gao-yan

Thanks! Give me a bit to have a closer look ...

wenningerk avatar Aug 03 '22 12:08 wenningerk

Resolved an issue found by coverity.

BTW, I'll be AFK for 4 weeks. So take your time :-) Thanks.

gao-yan avatar Aug 04 '22 05:08 gao-yan