RNeo4j Is RNeo4j Transactional Endpoint slow?

Got used to import a csv via the neo4j console. I had 50000 rows. After setting up an index I imported them in about 0.8 sec

Tried the same thing today with the transactional endpoint and it took 3 mins.

Is it that slow or am I doing something wrong?

Mar 09 '16 01:03 mamonu

Can you show me your code?

Mar 09 '16 01:03 nicolewhite

sure. If I go to the Neo4j web interface and I do this for example


CREATE INDEX ON :Person(person_ID)

//# Added 1 index, statement executed in 1662 ms.

USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:///home/data/SPRINT3-a-v1.csv"
AS row MERGE (a:Person { person_ID: row.person_id1 , source:"a"}) RETURN (a)

//#Returned 12613 rows in 894 ms

Now of I run the following code for the same size : (after loading the data in a dataframe called data!


library (RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")


addIndex(graph, "Person", "person_ID")
getIndex(graph)

t1 <- Sys.time()


query = ' MERGE (a:Person { person_ID: {person_ID} , source:"a"})'


t = newTransaction(graph)

for (i in 1:nrow(data)) {
  person_ID = data[i, ]$person_id1


  appendCypher(t, 
               query, 
               person_ID = person_ID)
}

commit(t)

t2 <- Sys.time()
t2 - t1

i get

Time difference of 3.250754 mins

Mar 09 '16 10:03 mamonu

Any news about this? I might be doing something wrong but from what I understood this is the way to use the transactional endpoint. But the performance is worrisome.

Mar 21 '16 11:03 mamonu

Sorry, thought I had responded to you. The problem is that you're committing in batches of 1000 in LOAD CSV and in a single batch of 12613 in the R code. It's not really a fair comparison. Can you commit in batches of 1000 in your R code and get back to me?

Mar 21 '16 17:03 nicolewhite

ok will do that and will get back to you

Mar 21 '16 17:03 mamonu

Any workaround for this? - Thanks

Apr 09 '16 09:04 sdoyen

Apologies for the long delay some other projects took my time... Back to the problem in hand...I run the following code which loads the same data as the LOAD CSV command in cypher.


library (RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
clear(graph)
setwd("/home/bigdata/data/")
data <- read.table(file = "SPRINT3-a-v1.csv",sep=",",header = TRUE)

addIndex(graph, "Person", "person_ID")
getIndex(graph)




query = ' MERGE (a:Person { person_ID: {person_ID} , source:"a"})'

t1 <- Sys.time()
tx = newTransaction(graph)

for (i in 1:nrow(data)) {



  if(i %% 1000 == 0) {
    # Commit current transaction.
    commit(tx)
    print(paste("Batch:", i / 1000, "committed."))
    # Open new transaction.
    tx = newTransaction(graph)
  }



  person_ID = data[i, ]$person_id1


  appendCypher(tx, 
               query, 
               person_ID = person_ID)
}


commit(tx)
print("Last batch committed.")
print("All done!")



t2 <- Sys.time()
t2 - t1

i think that this makes a fair comparison... (load the data in batches of 1000) I still get 3 mins for the operation. Apologies if this code is wrong and I have not understood the concept well...

Apr 18 '16 16:04 mamonu

I have the same problem. Neither createNode / createRelation nor appendCypher are fast enough to use. My workaround is to use getNode and cypher with normal queries. Also, I create CSV files and import them via READ CSV. Both have the disadvantage that the R Code is not really understandable if the reader doesn't know what cypher/Neo4j is plus creating the CSV files needs storage.

Thanks for your hard work.

Jun 27 '16 15:06 mkllr888

Sorry, I don't think the transactional endpoint will ever be as fast as LOAD CSV or neo4j-import. createNode() and createRe()l definitely won't be as fast as they are creating nodes / relationships one at a time in a single transaction.

Jun 27 '16 16:06 nicolewhite

RNeo4j RNeo4j copied to clipboard

Is RNeo4j Transactional Endpoint slow?

RNeo4j
RNeo4j copied to clipboard