lupa
lupa copied to clipboard
fix segfault
Hi , I encounter similar segfault behavior as mentioned in #84 and managed to debug and fix it, thanks to provided reproducing script.
This issue seems to be related with lua garbage collection of objects that was created and referenced (luaL_ref) in coroutine lua_State. Currently all lua objects are unreferenced from lua_state by python (gc) __dealloc__ method but sometimes unreferencing object created in coroutine ends with segfault because coroutine lua_state is broken. I'm not entirely sure why it is broken but it seems that coroutines associated with such state have already been finished. It is likely that lua gc have already collected such lua_state instance.
I manage to fix this by tracking references of created objects in coroutine lua_state and unreferencing them when associated coroutine (lua_state) ends. Implementation note: Because of lua_state is also present in LuaRuntime object and I want make this fix as simple as possible, I decide to track all referenced object of LuaRuntime not just those that was referenced by coroutines.
fix #84
Can somebody please review this fix? Thanks
I discovered that my intial fix contains bug that cause that object created in coroutine cannot be used outside of this coroutine scope because it unreference all objects created in coroutine at the time the coroutine finished.
Fix 422af19 introduce new solution, that do not track references but LuaObjects itself created in coroutine. When coroutine finished all objects created in this coroutine change their lua state to parent lua state of the coroutine. This way objects references in coroutine can be used outside of the coroutine scope and their deallocation does not cause segmentation fault. This fix also does not track objects created in LuaRuntime state resulting in lower memory usage that previous solution.
Thanks for investigating this. I'm not sure if the fix needs to be this complex and would be happy to see something simpler.
If I understand correctly, then the issue here is that _LuaObject instances get cleaned up that were used in the Lua coroutine, and that refer to the lua state of the coroutine, thus ending up with a dead dangling pointer after the coroutine itself gets cleaned up.
Questions:
- can we detect if a lua coroutine has finished?
- is there some kind of trigger that Lua could call when a coroutine finishes?
- is the coroutine cleanup triggered from Python or from Lua? (If from Python, we could play with the reference counts to prevent it from being cleaned up before the Python objects that reference it.)
- should the
_LuaObjectreally refer to the lua state of the coroutine (from the start), or to the one of the runtime? - or maybe we need two lua state references for lua objects (if they are used in coroutines)?
I'm not sure if the fix needs to be this complex and would be happy to see something simpler.
I totally agree. I managed to simplify it in 17dd7ea by using LuaRuntime state for object unreferencing. I seems that LuaRuntime state and coroutine state share the same registry for object references and when using LuaRuntime state for unreferencing we can be sure that it is not dangling pointer of already finished coroutine state. Let me know what you think about this solution.
I notice that there are multiple occurrences where similar object unreferencing occurs too (LuaIter object), but I was not able to came with code sample that would cause segmentation fault. I'm wondering if I should applied this change also in there or not. Do you have any idea about that?
Not sure. In fact, I'm not even sure I understand why there should be a difference between self._state and self._runtime._state. I don't know exactly how the coroutine state works. Should self._state refer to the runtime state, maybe?