gopy
gopy copied to clipboard
Addressing Incorrect Handling of Python GIL that Leads to a SEGFAULT
I found what appears to be a bug in bind/symbols.go.
Consider the following Go function that was generated by gopy:
func gopackage_GoObject_ExampleFunc(_handle CGoHandle, val CGoHandle, pythonCallback *C.PyObject, callbackArgument *C.char, goRun C.char) {
_fun_arg := pythonCallback
_saved_thread := C.PyEval_SaveThread() // Release GIL
defer C.PyEval_RestoreThread(_saved_thread) // Acquire GIL (deferred)
vifc, __err := gopyh.VarFromHandleTry((gopyh.CGoHandle)(_handle), "*gopackage.GoObject")
if __err != nil {
return
}
if boolPyToGo(goRun) {
go gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
if C.PyCallable_Check(_fun_arg) == 0 {
return
}
_gstate := C.PyGILState_Ensure() // Acquire GIL
_fcargs := C.PyTuple_New(2)
C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
C.PyObject_CallObject(_fun_arg, _fcargs)
C.gopy_decref(_fcargs)
C.gopy_err_handle()
C.PyGILState_Release(_gstate) // Release GIL
}, C.GoString(callbackArgument))
} else {
gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
if C.PyCallable_Check(_fun_arg) == 0 {
return
}
_gstate := C.PyGILState_Ensure() // Acquire GIL
_fcargs := C.PyTuple_New(2)
C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
C.PyObject_CallObject(_fun_arg, _fcargs)
C.gopy_decref(_fcargs)
C.gopy_err_handle()
C.PyGILState_Release(_gstate) // Release GIL
}, C.GoString(callbackArgument))
}
}
This is generated for a Go method ExampleFunction of a GoObject struct. The intent is for this method to be called from Python.
The bug is that we access the C/Python API function C.PyCallable_Check before acquiring the GIL, leading to a SEGFAULT. The problem lies in the following excerpt of the above function:
if C.PyCallable_Check(_fun_arg) == 0 {
return
}
_gstate := C.PyGILState_Ensure() // Acquire GIL
_fcargs := C.PyTuple_New(2)
Notice that we only acquire the GIL via PyGILState_Ensure after the call to C.PyCallable_Check(_fun_arg), which leads to a SEGFAULT.
The changes in this PR will result in the above Go function being generated differently as:
func gopackage_GoObject_ExampleFunc(_handle CGoHandle, val CGoHandle, pythonCallback *C.PyObject, callbackArgument *C.char, goRun C.char) {
_fun_arg := pythonCallback
_saved_thread := C.PyEval_SaveThread() // Release GIL
defer C.PyEval_RestoreThread(_saved_thread) // Acquire GIL (deferred)
vifc, __err := gopyh.VarFromHandleTry((gopyh.CGoHandle)(_handle), "*gopackage.GoObject")
if __err != nil {
return
}
if boolPyToGo(goRun) {
go gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
_gstate := C.PyGILState_Ensure() // Acquire GIL
if C.PyCallable_Check(_fun_arg) == 0 {
C.PyGILState_Release(_gstate) // Release GIL
return
}
_fcargs := C.PyTuple_New(2)
C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
C.PyObject_CallObject(_fun_arg, _fcargs)
C.gopy_decref(_fcargs)
C.gopy_err_handle()
C.PyGILState_Release(_gstate) // Release GIL
}, C.GoString(callbackArgument))
} else {
gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
_gstate := C.PyGILState_Ensure() // Acquire GIL
if C.PyCallable_Check(_fun_arg) == 0 {
C.PyGILState_Release(_gstate) // Release GIL
return
}
_fcargs := C.PyTuple_New(2)
C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
C.PyObject_CallObject(_fun_arg, _fcargs)
C.gopy_decref(_fcargs)
C.gopy_err_handle()
C.PyGILState_Release(_gstate) // Release GIL
}, C.GoString(callbackArgument))
}
}
Notice that we now call C.PyGILState_Ensure() before the call to C.PyCallable_Check(_fun_arg). Likewise, we call C.PyGILState_Release(_gstate) before returning within the body of the associated if-statement to ensure that there is a matching call (to match the _gstate := C.PyGILState_Ensure()).
In terms of testing, I've just been using the modified gopy in an application I'm developing. I'm in the process of doing more testing in the meantime; however, I am confident that this is a bug, as you must hold the Python GIL when interfacing/invoking any of the C/Python API.
From the "Thread State and the Global Interpreter Lock" subsection of the "Initialization, Finalization, and Threads" Python C-API documentation:
"Therefore, the rule exists that only the thread that has acquired the GIL may operate on Python objects or call Python/C API functions."
On a separate (but related) note -- since we defer C.PyEval_RestoreThread(_saved_thread) at the beginning of the method's execution, it's a little silly to release the GIL via C.PyGILState_Release(_gstate) immediately before returning (even in the case where we don't call into Go code); however, we need to have matching calls between C.PyEval_SaveThread() and C.PyEval_RestoreThread() as well as PyGILState_Ensure() and PyGILState_Release(), so I think it ultimately makes sense to do things this way for now. In theory, we could provide more complex logic to handle the GIL more efficiently in the future.
Hi @Scusemua !
Thanks for your patch! You saved me several hours of life! :)) And now my turn.
PyGILState locking is needed for sure, but Go is insidious too :) Goroutines can run on different OS threads and switch them anytime...
even between PyGILState_Ensure/PyGILState_Release calls!
And sure, PyGILState_Release will panic that current thread has no python thread info and SEGFAULT etc.
My simple two clients-loaded Web server was crashing on this even with your patch.
To fix - we need to lock current goroutine to OS thread during the PyGILState_Ensure/PyGILState_Release section.
Simply add runtime.LockOSThread() call before PyGILState_Ensure (1 place in bind/symbols.go)
and
runtime.UnlockOSThread() after every PyGILState_Release call (3 places in bind/symbols.go)
And now my test server is running fine, calling Python callback in the web handler :)
Feel free to add this to your branch and be brave to propose your PR for merge!
Have nice days guys!
Hi @nickeygit,
Yeah, I also found that I needed to pin the goroutines to OS threads, or else I would get really strange errors -- just like you're describing. I just think I forgot to add an update about this to my original post. 😅
With these two fixes, everything has been working fine on my end with no more issues for months! :)
I've encountered a new issue in which the application encounters a segmentation fault when calling the resolve callback function.
Sometimes, the segmentation fault occurs on this line:
if C.PyCallable_Check(_fun_arg) == 0 {
Other times, it doesn't occur until C.PyObject_CallObject(_fun_arg, _fcargs).
The issue is with the _fun_arg variable. The _fcargs is fine. I've inspected the callstack in GDB and whatnot, and when the segmentation fault occurs, it's because the _fun_arg is pointing to garbage, rather than the intended PyObject*.
At first, I thought it was perhaps because the callback was being executed before the original thread state was restored by
defer C.PyEval_RestoreThread(_saved_thread)
But I tried adding a sync.WaitGroup such that the callback cannot run until the C.PyEval_RestoreThread(_saved_thread) statement is restored, and the segmentation fault still occurs, with the _fun_arg pointing to garbage...
Has anybody else encountered this? Have you ever experienced this issue, @nickeygit?