pymdb-lightning Reader slow initialization

Hey there,

thanks for a great lib, I find it much more pythonic than py-lmdb. However I found an issue. When I create a DB and open a Reader at the same script, everything is ok and the speed is great. But if I first create the DB, exit, start another script with a new Reader (on the same database) - the speed is really bad (it seems that the script is loading something, but I couldn't even wait for it to finish on my 3 mln keys database). So that's probably an issue.

Also, can I use threads or multiprocessing module with your lib? I've seen an example at py-lmbd, but, again, I'd prefer using your library as I just like it more.

Thanks!

Sep 14 '13 04:09 andr0s

Very weird...

Did you close your transaction properly when the creation of the DB was done? To ensure that, just do a quick check in the CLI "mdb_stat -arfe /path/to/your_db". The write transaction should have been committed successfully if there are enough entries in your DB.

To reproduce this issue, could you please provide more details e.g. some code snippets?

Yes, this library is completely fine with multi-threading and multi-processing scenarios just as LDMB does. But beware of some caveats when using multi-threading, LMDB has a very great description about that.

Thanks

Sep 15 '13 02:09 ncloudioj

Well, might be better just to share the code: http://dpaste.com/hold/1380854/

I added the line 63 after you wrote about closing the DB. And now it seems it works really slow always =)) I mean, if I write the DB and don't execute writer.close() and open a reader after that - the speed is great. But if the code is executed as is, writing-closing-reading: Method <function filter1 at 0x106611848> : took 579 seconds vs ~6 seconds if I don't close the writer.

Mac OS 10.7.3, standard Python 2.7, mdb compiled exactly using the instructions in your readme.

Sep 15 '13 04:09 andr0s

Hey andr0s,

The slowness is probably due to the writer.put(k, v) function, which will create a new write transaction to make sure the data is actually inserted into DB.

To store a large number of KVs, use writer.mput(data) instead. The argument data is any iterable of KV tuples(ideally a generator of KVs). This function handles the time-consuming write transaction in a smarter way.

Also, on Mac OS X, we've encountered performance drops from time to time. As a mmap KV store, it's up to OS to decide when sync memory pages into disk. On Linux servers, however, LMDB always performs as expected. Your script completed in a flash on our Linux box.

Thanks

Sep 15 '13 13:09 ncloudioj

Hm... that's a bit strange because I've seen some high disc IO operations after I .put() everything (so I think at that time the code was dumping the data on HDD). After the IO rate goes slow, I supposed, all the data is written and all the transactions are closed. So I thought since that moment there shouldn't be any difference for Reader. Ok I'll try .mput(). Thank you.

Sep 16 '13 01:09 andr0s

pymdb-lightning pymdb-lightning copied to clipboard

Reader slow initialization

pymdb-lightning
pymdb-lightning copied to clipboard