benchmarks
benchmarks copied to clipboard
Address Neo4j misconfiguration and inefficiency issues
From Michael Hunger's response to your benchmarking article:
Just really quickly. Unfortunately, your benchmark has a number of issues, that invalidates all its Neo4j measurements.
We recommend users in general to ignore vendor benchmarks and test with their own hardware, data, use-cases for relevant and reliable results.
Here is a quick list from just skimming over, I didn't measure or test anything so don't assume it is all correct / working:
General:
- Neo4j is no RDF database, so RDF data model makes no sense
- using a community provided Go Driver for which performance has not been validated, not an official driver like JS, Java, .Net
- incorrect information in feature table
- memory usage can be configured
Writes:
- merge query without labels (:Node) each statement does 2 full all-node-scans
- no constraint for :Node(xid)
- no timing published for csv import (11s on my mac 1.1M triples)
- doesn't use transactions of eg. 50k or 100k updates per request
- which can be best achieved with a single query per tx and UNWIND of a payload of a array of structs
Reads:
- no use of parameters in reads
- disabled query plan cache (which was incorrectly understood as query result cache, which doesn't exist)
- no constraint for :Film(filmId)
Good luck with your development of dgraph, it looks like a good technology for RDF use-cases.
Cheers, Michael@neo4j
To those points raised by Michael, I had replied:
Just to clarify, we ran the tests twice both read-only and r-w workloads -- once with Neo4j query cache enabled and once disabled. All four results are presented.
The Go driver that we used did little more than just call Neo4j over Bolt. So, we determined it safe to be used for benchmarking. Also, if you think any information in the feature table was incorrect, can you please send a mail to [email protected] with the correct version?
The particular issues you raised about optimizing reads and writes seem valid. Would you mind sending a PR to fix the way we query Neo4j? The code is here: https://github.com/dgraph-io/benchmarks/tree/master/data/neo4j
We'll be happy to re-run the numbers and update our post accordingly.
To which Michael had replied:
Unfortunately, I don't have the capacity to fix the issues in your test code, as our large community keeps me busy.
As I mentioned before, you might have good points, so please just send a PR to fix those. That's why we open sourced this code, so Neo4j folks can ensure that we're representing them in the best light possible. Complaining doesn't help, code does.
Did this got fixed?
No PR was ever raised.
I see
This issue has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.