python-rocksdb
python-rocksdb copied to clipboard
VectorMemtableFactory broken on 5.3.6
On 5.3.6, opts.memtable_factory = rocksdb.VectorMemtableFactory()
doesn't seem to work. It gives the error: Invalid argument: Memtable doesn't concurrent writes (allow_concurrent_memtable_write)
. But allow_concurrent_memtable_write
doesn't seem to be exposed in the API, so it can't be changed.
@EliFinkelshteyn
I have not tested the code on 5.3.6
, but it's ok on 5.4.6
.
Here is my test code
import rocksdb
opts = rocksdb.Options()
opts.memtable_factory = rocksdb.VectorMemtableFactory()
I've confirmed this does not work on either 5.3.6 or 5.4.6. Your test code works fine, but if you then attempt to create a db, I'm guessing you'll see the same error I got.
import rocksdb
opts = rocksdb.Options()
opts.memtable_factory = rocksdb.VectorMemtableFactory()
opts.create_if_missing = True
test_db = rocksdb.DB("my_db", opts)
gives
InvalidArgument: Invalid argument: Memtable doesn't concurrent writes (allow_concurrent_memtable_write)
@EliFinkelshteyn
got it!
this is the bug ( it should not throw such exceptions).
And it seems that allow_concurrent_memtable_write
is only implemented in SkipListFactory
,
so the following code woks.
import rocksdb
opts = rocksdb.Options()
opts.memtable_factory = rocksdb.SkipListMemtableFactory()
opts.create_if_missing = True
test_db = rocksdb.DB("my_db", opts)
I will fix this in the next release.
thanks
@EliFinkelshteyn
I have submited a fix patch to branch dev-issue9
you can try to test whether this can fix your problem.
git clone https://github.com/twmht/python-rocksdb.git -b dev-issue9
cd python-rocksdb
python setup.py install
the test code
opts = rocksdb.Options()
opts.allow_concurrent_memtable_write = False
opts.memtable_factory = rocksdb.VectorMemtableFactory()
opts.create_if_missing = True
test_db = rocksdb.DB("/tmp/test", opts)
thanks
I can confirm this fixes the issue. Thanks!
Unfortunately, it doesn't seem to change timings at all in my initial bulk data load as this implies should happen, but I guess YMMV.
Anyway, thanks for the quick fix and for taking over maintenance of this library! One last comment is that it might be worthwhile to change the error message InvalidArgument: Invalid argument: Memtable doesn't concurrent writes (allow_concurrent_memtable_write)
to something more explicit like the allow_concurrent_memtable_write option must be false to use your desired memtable_factory
to give guidance to anyone running into this in the future.
@EliFinkelshteyn
thanks! there are more features are coming.
if this patch fixed the problem, please close this issue.
I will look into the other problem you point out, but if you have other problems, please open a new issue for that.
Happy to. Figured you'd want to do it with the patch merge, but don't mind doing it here. Thanks again for the quick fix!
@EliFinkelshteyn
One more question, what does this mean?
it doesn't seem to change timings at all in my initial bulk data load
You may show some expected results including the test code, so that I can look into that.
thanks!
Sure, what I mean is the tutorial docs state:
For initial bulk loads the Vector-MemtableFactory makes sense.
That led me to believe I'd see faster times for a large initial bulk load by using Vector-MemtableFactory
, but in fact I got identical times using this. Some (very simple) test code:
import rocksdb
import ujson
import random
import string
import time
def generate_ascii_str(length=10):
return ''.join(random.choice(string.letters + string.digits)
for _ in range(length))
dummy_datas = list()
for i in range(1000000):
dummy_data = \
{"datum1": "this is a data/blah blah blah/" +
generate_ascii_str(10),
"datum2": "this is a other data/blah blah blah/" +
generate_ascii_str(10),
"description": "this is a description/blah blah blah/" +
generate_ascii_str(10)}
dummy_data_str = ujson.dumps(dummy_data).decode('utf8')
dummy_datas.append(dummy_data_str)
opts = rocksdb.Options()
# set both of these so we don't keep any logs (we never need them)
opts.wal_ttl_seconds = 0
opts.wal_size_limit_mb = 0
opts.db_log_dir = "LOGS"
opts.compression = rocksdb.CompressionType.zlib_compression
opts.write_buffer_size = 2400000000
opts.allow_concurrent_memtable_write = False
opts.memtable_factory = rocksdb.VectorMemtableFactory()
opts.create_if_missing = True
db = rocksdb.DB("/tmp/test_rocks_db", opts)
begin_time = time.time()
to_write = list()
batch = rocksdb.WriteBatch()
for i, item in enumerate(dummy_datas):
db.put(bytes(i), bytes(item))
db.write(batch, disable_wal=True)
end_time = time.time()
print "rocksdb creation took:"
print end_time - begin_time