foundationdb Not all binding implementations follow the requirements of fdb_stop

In the current C API, it is required that fdb_stop_network be called and the network thread allowed to complete before terminating the program. It seems we aren't doing this in all of our bindings, and absent a change to this requirement as proposed in #2978, this can lead to undefined behavior.

It looks like the current state of our in-tree bindings are that:

Ruby and Python follow the requirements of the API (stop then join)
Java stops the network thread but doesn't join it
Go doesn't stop the network thread automatically or provide any API to do it manually as far as I can tell

If #2978 is done, then this won't be an issue. However, I suspect that solving this problem is easier than the other one, so it may be worth updating the two bindings in the meantime.

Apr 23 '20 20:04 ajbeamon

It seems I may have been mistaken about the Java case, where it looks like we do block on the run network call terminating in our implementation of stop network.

May 20 '20 18:05 ajbeamon

In Go land, it seems a bit harder to do.

Ruby, Python and Java are using the API provided by language atexit/onShutdownHook respectively which can register functions to be called when programs end. Go doesn't seem to have any equivalent. There is SetFinalizer but doesn't seem to be helpful in this case as the documentation says

The finalizer is scheduled to run at some arbitrary time after the program can no longer reach the object to which obj points. There is no guarantee that finalizers will run before a program exits

So it seems like Go really emphasizes on explicitly handling cleanup up stuff, and the solution has to be change in API itself.

May 20 '20 19:05 vishesh

It could be that the real requirement is that you join the network thread before returning from main, and atexit is too late to avoid the undefined behavior (atexit is also when global destructors are run)

May 20 '20 19:05 sfc-gh-anoyes

Or maybe the fact that we are waiting for fdb_run_network to stop but not actually joining the thread that it's running in is a problem.

May 20 '20 20:05 ajbeamon

@vishesh So are you saying that we should expose the stopNetwork function in Go and that's it for now?

May 22 '20 18:05 ajbeamon

In Go land, it seems a bit harder to do.

Indeed it is. This is the best I could come up with:

//go:linkname runtime_addExitHook runtime.addExitHook
func runtime_addExitHook(f func(), runOnNonZeroExit bool)

func init() {
	// this is a mitigation for https://github.com/apple/foundationdb/issues/3015
	// and it has the purpose of having our tests with -race enabled not crash with SIGSEGV
	// due to the destructors being invoked while the network thread is still running
	runtime_addExitHook(fdb.StopNetwork, true)
}

I am exposing StopNetwork() in a PR here, and will be testing out whether this approach works for tests with -race, or not.

May 14 '24 13:05 gm42

Not all binding implementations follow the requirements of fdb_stop_network