fix: Prevent calling the main process exit without WithOnClose()
My service is a multi-goroutine daemon service, and nri-plugin is one of the modules of my service. In nri-plugin I did not use WithOnClose() to set the exit callback function, I would rebuild a new stub when exiting. I found that when the stup exception occurred, the daemon service main was actually exited. I felt that it was very unfriendly to exit the deamon service due to a goroutine exception. After checking, it is found that there is a problem with the following code:
// Handle a lost connection.
func (stub *stub) connClosed() {
stub.Lock()
stub.close()
stub.Unlock()
if stub.onClose != nil {
stub.onClose()
return
}
os.Exit(0) // exits the process
}
os.Exit(0) exits the process,for the nri module package, using runtime.Goexit() is more elegant. runtime.Goexit() will only exit the current goroutine, not the daemon process.
@klihub @mikebrow @fuweid can anyone help to review this ? thanks.
My service is a multi-goroutine daemon service, and nri-plugin is one of the modules of my service. In nri-plugin I did not use WithOnClose() to set the exit callback function, I would rebuild a new stub when exiting.
@attlee-wang Not directly related to this PR, but it might be useful for you: ever since we have merged PR #91, it became possible to restart a stub by using a Stop() followed by a Start().
I found that when the stup exception occurred, the daemon service main was actually exited. I felt that it was very unfriendly to exit the deamon service due to a goroutine exception. After checking, it is found that there is a problem with the following code: ...
This behavior has been discussed and criticized earlier a few times, and it is fair to say that not without a reason. I have a few question/comments related to this.
- This original, and admittedly a bit harshly/hastily implemented behavior, was partly in place to make pre-connected/runtime-launched plugins exit once they lose their pre-connected transport (a
socketpair()which cannot be externally re-established). It would be nice and also fairly easily possible to retain this behavior. We could either save the fact in the original connection setup/adoption code that we took a socketpair and then use that in the connection lost handler, or we could simply re-check the same env. var and use its existence for the same. - Cannot you work around this immediate problem in your current code by simply creating your stub with a
WithOnClose()handler to prevent the stub internally doing aos.Exit(0)? Your handler could either do nothing or, for instance using a channel, signal your upper layer logic to recreate (or Stop()/Start()) the stub, if you are neither running it withstub.Run()nor waiting for its termination withstub.Wait()?
ping @attlee-wang
- Cannot you work around this immediate problem in your current code by simply creating your stub with a
WithOnClose()handler to prevent the stub internally doing aos.Exit(0)? Your handler could either do nothing or, for instance using a channel, signal your upper layer logic to recreate (or Stop()/Start()) the stub, if you are neither running it withstub.Run()nor waiting for its termination withstub.Wait()?
@klihub It's not that I don't use WithOnClose(). I looked at the stub.go package and exist stub.Stop(). I subjectively thought that stub.Stop() could be used. After all, WithOnClose() is just an optional function.
I think https://github.com/containerd/nri/pull/173 resolves the same problem, but does so in a way that exposes the error to the plugin better. Thanks for your contribution @attlee-wang.