RNeo4j
RNeo4j copied to clipboard
Is RNeo4j Transactional Endpoint slow?
Got used to import a csv via the neo4j console. I had 50000 rows. After setting up an index I imported them in about 0.8 sec
Tried the same thing today with the transactional endpoint and it took 3 mins.
Is it that slow or am I doing something wrong?
Can you show me your code?
sure. If I go to the Neo4j web interface and I do this for example
CREATE INDEX ON :Person(person_ID)
//# Added 1 index, statement executed in 1662 ms.
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:///home/data/SPRINT3-a-v1.csv"
AS row MERGE (a:Person { person_ID: row.person_id1 , source:"a"}) RETURN (a)
//#Returned 12613 rows in 894 ms
Now of I run the following code for the same size : (after loading the data in a dataframe called data!
library (RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
addIndex(graph, "Person", "person_ID")
getIndex(graph)
t1 <- Sys.time()
query = ' MERGE (a:Person { person_ID: {person_ID} , source:"a"})'
t = newTransaction(graph)
for (i in 1:nrow(data)) {
person_ID = data[i, ]$person_id1
appendCypher(t,
query,
person_ID = person_ID)
}
commit(t)
t2 <- Sys.time()
t2 - t1
i get
Time difference of 3.250754 mins
Any news about this? I might be doing something wrong but from what I understood this is the way to use the transactional endpoint. But the performance is worrisome.
Sorry, thought I had responded to you. The problem is that you're committing in batches of 1000 in LOAD CSV and in a single batch of 12613 in the R code. It's not really a fair comparison. Can you commit in batches of 1000 in your R code and get back to me?
ok will do that and will get back to you
Any workaround for this? - Thanks
Apologies for the long delay some other projects took my time... Back to the problem in hand...I run the following code which loads the same data as the LOAD CSV command in cypher.
library (RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
clear(graph)
setwd("/home/bigdata/data/")
data <- read.table(file = "SPRINT3-a-v1.csv",sep=",",header = TRUE)
addIndex(graph, "Person", "person_ID")
getIndex(graph)
query = ' MERGE (a:Person { person_ID: {person_ID} , source:"a"})'
t1 <- Sys.time()
tx = newTransaction(graph)
for (i in 1:nrow(data)) {
if(i %% 1000 == 0) {
# Commit current transaction.
commit(tx)
print(paste("Batch:", i / 1000, "committed."))
# Open new transaction.
tx = newTransaction(graph)
}
person_ID = data[i, ]$person_id1
appendCypher(tx,
query,
person_ID = person_ID)
}
commit(tx)
print("Last batch committed.")
print("All done!")
t2 <- Sys.time()
t2 - t1
i think that this makes a fair comparison... (load the data in batches of 1000) I still get 3 mins for the operation. Apologies if this code is wrong and I have not understood the concept well...
I have the same problem. Neither createNode / createRelation nor appendCypher are fast enough to use. My workaround is to use getNode and cypher with normal queries. Also, I create CSV files and import them via READ CSV. Both have the disadvantage that the R Code is not really understandable if the reader doesn't know what cypher/Neo4j is plus creating the CSV files needs storage.
Thanks for your hard work.
Sorry, I don't think the transactional endpoint will ever be as fast as LOAD CSV
or neo4j-import
. createNode()
and createRe()l
definitely won't be as fast as they are creating nodes / relationships one at a time in a single transaction.