benchmarks Address Neo4j misconfiguration and inefficiency issues

From Michael Hunger's response to your benchmarking article:

Just really quickly. Unfortunately, your benchmark has a number of issues, that invalidates all its Neo4j measurements.

We recommend users in general to ignore vendor benchmarks and test with their own hardware, data, use-cases for relevant and reliable results.

Here is a quick list from just skimming over, I didn't measure or test anything so don't assume it is all correct / working:

General:

Neo4j is no RDF database, so RDF data model makes no sense

using a community provided Go Driver for which performance has not been validated, not an official driver like JS, Java, .Net

incorrect information in feature table

memory usage can be configured

Writes:

merge query without labels (:Node) each statement does 2 full all-node-scans

no constraint for :Node(xid)

no timing published for csv import (11s on my mac 1.1M triples)

doesn't use transactions of eg. 50k or 100k updates per request

which can be best achieved with a single query per tx and UNWIND of a payload of a array of structs

Reads:

no use of parameters in reads

disabled query plan cache (which was incorrectly understood as query result cache, which doesn't exist)

no constraint for :Film(filmId)

Good luck with your development of dgraph, it looks like a good technology for RDF use-cases.

Cheers, Michael@neo4j

Jun 03 '17 04:06 InverseFalcon

To those points raised by Michael, I had replied:

Just to clarify, we ran the tests twice both read-only and r-w workloads -- once with Neo4j query cache enabled and once disabled. All four results are presented.

The Go driver that we used did little more than just call Neo4j over Bolt. So, we determined it safe to be used for benchmarking. Also, if you think any information in the feature table was incorrect, can you please send a mail to [email protected] with the correct version?

The particular issues you raised about optimizing reads and writes seem valid. Would you mind sending a PR to fix the way we query Neo4j? The code is here: https://github.com/dgraph-io/benchmarks/tree/master/data/neo4j

We'll be happy to re-run the numbers and update our post accordingly.

To which Michael had replied:

Unfortunately, I don't have the capacity to fix the issues in your test code, as our large community keeps me busy.

As I mentioned before, you might have good points, so please just send a PR to fix those. That's why we open sourced this code, so Neo4j folks can ensure that we're representing them in the best light possible. Complaining doesn't help, code does.

Jun 03 '17 05:06 manishrjain

Did this got fixed?

Mar 06 '18 06:03 ShalokShalom

No PR was ever raised.

Mar 06 '18 22:03 manishrjain

I see

Mar 07 '18 08:03 ShalokShalom

This issue has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.

Jul 12 '24 01:07 github-actions[bot]

benchmarks benchmarks copied to clipboard

Address Neo4j misconfiguration and inefficiency issues

benchmarks
benchmarks copied to clipboard