py-lmdb Close/get race segfault

Ubuntu lmdb version 0.94, cpython version. This code added as a method to CrashTest in tests/crash_test.py crashes 90% of the time:

    def testCloseGetRace(self):
        def reader(self):
            for i in range(1000):
                try:
                    with self.env.begin() as txn:
                        txn.get(i.to_bytes(8, 'big'))
                except lmdb.Error:
                    pass
        ts = [threading.Thread(target=reader, args=(self,)) for _ in range(2)]
        [t.start() for t in ts]
        for i in range(1000):
            self.env.close()
            self.env = lmdb.open(self.path, max_dbs=10)

        [t.join() for t in ts]

At least one crash failure is that a transobject is living well past the lifetime of its envobject. This is caused because, in env_clear when it is calling clear on its children (https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L1116), trans_clear can release the GIL (https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L2941), allowing a new transaction to sneak in as a child that never gets cleared. The debug log confirms this.

Solution: I can remove the GIL release around mdb_txn_abort, or I can mark the env so a transaction creation won't get that far. Ideas?

May 18 '18 05:05 jnwatson

Yikes! This is a nice one :) I think with MDB_WRITEMAP abort() can be expensive, so holding GIL there might be problematic, but it's still a reasonable solution. How about a 'bool closing' flag in the Environment that causes an exception to be thrown in the Transaction constructor? That would work, if I'm understanding things correctly. Threading is hard!

May 18 '18 06:05 dw

Re: writemap, for large transactions MDB starts spilling dirty pages to the map before the transaction has completed. I can't remember if it needs to perform any kind of rollback. Come to think of it, I don't think it does. So either of these solutions sound fine really

May 18 '18 06:05 dw

I found a bunch more races, and I have a solution to them all, in cpython. Now to tackle cffi.py...

Sep 05 '18 04:09 jnwatson

Keeping this open. This is, in general, super problematic to solve because it incurs overhead on 99% of folks that don't do multithreading stuff.

Jul 04 '19 18:07 jnwatson

py-lmdb py-lmdb copied to clipboard

Close/get race segfault

py-lmdb
py-lmdb copied to clipboard