sophia
sophia copied to clipboard
Big slowdown on iorena tests when reusing data dir on a particular workload
Here's the cli args I used:
./ioarena -D $DB -B crud -l walon -m nosync -p $OUTPUT_DIR/data -n 10000000 -k 9 -v 127 -i yes -w 2
This completes fine.
Using the same commands above with -r 2 (e.g. using the same data dir) would trigger the slow down (takes more than an hour, I terminated the benchmark when I noticed it).
I tried the commands above with nessdb, it was able to complete normally.
I've attached the logs results-w2.tar.gz results-w2r2-existing.tar.gz
Here's the result with a fresh data dir (no existing db). results-w2r2.tar.gz
As a side note, it looks like sophia stores the keys entirely in ram (very low read iops, large ram usage). The high write iops though I guess is from the heavy compactions.
Btw, this was run on a laptop. Intel I7 2.6 ghz 4c/8t with 16g ram on Ubuntu 14.04 x64
I forgot to mention that with a fresh data dir, there is no slowdown (as you can see on results-w2r2.tar.gz logs).
Right now Sophia does not benefit from parallel multi-thread access, in fact it will run slower (-w 2). Only one thread will work, others will wait to finish.
v2.2 is very different from other versions, it has a new storage architecture. It will use as much RAM memory as your write rate and expected storage capacity are.
Sophia is designed for production cases when we know exact worst write-rate and dataset, then we can have a conclusion about memory requirements. Some numbers: http://sophia.systems/v2.2/admin/memory_requirements.html
ioarena actually is a bad way to benchmark Sophia. It loads as much data as possible, which is a bad because it will need a lot of RAM to handle it and database reuse will have to reply all the logs to start over. In future i will try to add more options to make ioarena more suitable for Sophia benchmarking.
Unfortunately only known good for way now to benchmark and use Sophia is to understand it requirements and make custom tests which simulates your real case (latencies).
Sorry about that.
Btw, with only a single writer (like lmdb), how does the mvcc work? Do you have an example somewhere on a repo?
I'm thinking the worker threads do the CRUD (without applying changes) and then submit those to a single writer thread which would apply those changes (against a specific version)
Any news on this? There has been significant changes to the code base in the mean time...