py-lmdb
py-lmdb copied to clipboard
Close/get race segfault
Ubuntu lmdb version 0.94, cpython version. This code added as a method to CrashTest in tests/crash_test.py crashes 90% of the time:
def testCloseGetRace(self):
def reader(self):
for i in range(1000):
try:
with self.env.begin() as txn:
txn.get(i.to_bytes(8, 'big'))
except lmdb.Error:
pass
ts = [threading.Thread(target=reader, args=(self,)) for _ in range(2)]
[t.start() for t in ts]
for i in range(1000):
self.env.close()
self.env = lmdb.open(self.path, max_dbs=10)
[t.join() for t in ts]
At least one crash failure is that a transobject is living well past the lifetime of its envobject. This is caused because, in env_clear
when it is calling clear on its children (https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L1116), trans_clear
can release the GIL (https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L2941), allowing a new transaction to sneak in as a child that never gets cleared. The debug log confirms this.
Solution: I can remove the GIL release around mdb_txn_abort, or I can mark the env so a transaction creation won't get that far. Ideas?
Yikes! This is a nice one :) I think with MDB_WRITEMAP abort() can be expensive, so holding GIL there might be problematic, but it's still a reasonable solution. How about a 'bool closing' flag in the Environment that causes an exception to be thrown in the Transaction constructor? That would work, if I'm understanding things correctly. Threading is hard!
Re: writemap, for large transactions MDB starts spilling dirty pages to the map before the transaction has completed. I can't remember if it needs to perform any kind of rollback. Come to think of it, I don't think it does. So either of these solutions sound fine really
I found a bunch more races, and I have a solution to them all, in cpython. Now to tackle cffi.py...
Keeping this open. This is, in general, super problematic to solve because it incurs overhead on 99% of folks that don't do multithreading stuff.