benchmark-postgres-mongo icon indicating copy to clipboard operation
benchmark-postgres-mongo copied to clipboard

Thoughts on improving the benchmarking

Open xdg opened this issue 7 years ago • 2 comments

Hi. I love that you put this together. Thank you!

I'm a team lead at MongoDB and my team works on our new Go driver (not mgo, which is a community-based project). Work like this repo is great to help us to see how the new driver stacks up.

I have a few thoughts on improving the benchmarks for you to consider:

  • The benchmark includes connection dialing and auth handshake. Unless your microservice is not persistent and has to dial/handshake every request, it would be good to pull that outside the timed section and pass the db/session objects into your query benchmark functions.
  • It looks like the benchmarking is on a single query. I think you'd get more reliable results looping the query several times (once the connection handshake is factored out) and averaging the results.
  • The postgres benchmark uses an iterator, calling Next and Scan to fill-in and print a single allocated record, but the mgo benchmark calls All to fill in an array with all records. The latter has to do much more memory allocation. It would be more consistent to call iter := c.Pipe(query).Iter() for the mgo case and iterate individually so the memory workload is similar between the two benchmarks.
  • The benchmark includes printing out all records. That means record stringification is in the benchmark time -- not a typical microservice workload, I would think. While impact should be common to both cases, you might also consider changing that to either omit it (so that the benchmark is just record retrieval) or to add a synthetic workload more typical of a microservice, e.g. converting each retrieved record to JSON (but not printing it out).

xdg avatar Mar 12 '18 12:03 xdg

Hey. Thank you for your feedback!!! Please let me know when new driver ready, I'll update benchmark.

cn0047 avatar Apr 06 '18 21:04 cn0047

Totally agreed with above comments.

These benchs are based on 500/5k entries.

Maybe it would be interesting, for a particular query (averaged upon 20tries), to check both DB perfs at different load points. Maybe at some point, one DB may be more performant than the other ? this criteria would then become the one to consider for a choice.

By the way, it would also be nice to display the DB versions in bench headers.

Great initiative !

Sharlaan avatar Jul 02 '18 08:07 Sharlaan