sophia icon indicating copy to clipboard operation
sophia copied to clipboard

Big slowdown on iorena tests when reusing data dir on a particular workload

Open dyu opened this issue 7 years ago • 5 comments

Here's the cli args I used:

./ioarena -D $DB -B crud -l walon -m nosync -p $OUTPUT_DIR/data -n 10000000 -k 9 -v 127 -i yes -w 2

This completes fine.

Using the same commands above with -r 2 (e.g. using the same data dir) would trigger the slow down (takes more than an hour, I terminated the benchmark when I noticed it).

I tried the commands above with nessdb, it was able to complete normally.

I've attached the logs results-w2.tar.gz results-w2r2-existing.tar.gz

dyu avatar Aug 28 '16 19:08 dyu

Here's the result with a fresh data dir (no existing db). results-w2r2.tar.gz

As a side note, it looks like sophia stores the keys entirely in ram (very low read iops, large ram usage). The high write iops though I guess is from the heavy compactions.

Btw, this was run on a laptop. Intel I7 2.6 ghz 4c/8t with 16g ram on Ubuntu 14.04 x64

dyu avatar Aug 28 '16 19:08 dyu

I forgot to mention that with a fresh data dir, there is no slowdown (as you can see on results-w2r2.tar.gz logs).

dyu avatar Aug 28 '16 19:08 dyu

Right now Sophia does not benefit from parallel multi-thread access, in fact it will run slower (-w 2). Only one thread will work, others will wait to finish.

v2.2 is very different from other versions, it has a new storage architecture. It will use as much RAM memory as your write rate and expected storage capacity are.

Sophia is designed for production cases when we know exact worst write-rate and dataset, then we can have a conclusion about memory requirements. Some numbers: http://sophia.systems/v2.2/admin/memory_requirements.html

ioarena actually is a bad way to benchmark Sophia. It loads as much data as possible, which is a bad because it will need a lot of RAM to handle it and database reuse will have to reply all the logs to start over. In future i will try to add more options to make ioarena more suitable for Sophia benchmarking.

Unfortunately only known good for way now to benchmark and use Sophia is to understand it requirements and make custom tests which simulates your real case (latencies).

Sorry about that.

pmwkaa avatar Aug 28 '16 20:08 pmwkaa

Btw, with only a single writer (like lmdb), how does the mvcc work? Do you have an example somewhere on a repo?

I'm thinking the worker threads do the CRUD (without applying changes) and then submit those to a single writer thread which would apply those changes (against a specific version)

dyu avatar Sep 14 '16 13:09 dyu

Any news on this? There has been significant changes to the code base in the mean time...

dumblob avatar Jun 06 '19 11:06 dumblob