apoc
apoc copied to clipboard
running apoc.periodic.iterate and exceeding heap memory in neo4j 4.0.0 and APOC 4.0.0.3
Issue by jialudeng
Thursday Feb 20, 2020 at 06:25 GMT
Originally opened as https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/1418
Expected Behavior (Mandatory)
Creating 50 million new nodes and 50 million new relationships pointing towards existing 10 million nodes.
Actual Behavior (Mandatory)
Only 760k relationships and nodes were created and heap memory ran out. I later tested with the same config and queries using neo4j 3.4.14 and apoc 3.4.0.8. All 50 million nodes and relationships were successfully created in 1587348 ms.
How to Reproduce the Problem
I ran my apoc.periodic.iterate wrapped query in the desktop version 1.2.4 with neo4j version 4.0.0 and APOC version 4.0.0.3.
I increased heap size in neo4j.config as neo4j community suggested
dbms.memory.heap.initial_size=8G
dbms.memory.heap.max_size=8G
dbms.memory.pagecache.size=8G
I loaded the first group of Listing nodes from csv
CALL apoc.periodic.iterate("
CALL apoc.load.csv('file:///listings.csv',{
mapping:{
id: {type:'int'},
beds: {type:'int'},
price: {type: 'int'},
score: {type: 'float'},
reviews: {type: 'int'}
}
}) YIELD map as row return row
","
CREATE (l:Listing) SET l = row
", {batchSize:10000, iterateList:true, parallel:true});
I created a unique constraint on the property id of Listing nodes
CREATE CONSTRAINT ON (listing:Listing) ASSERT listing.id IS UNIQUE
I then loaded the second group of Picture nodes from csv and created relationships, which maxed out the heap memory
CALL apoc.periodic.iterate("
CALL apoc.load.csv('file:///pictures.csv',{
mapping:{
id: {type:'int'},
listing: {type:'int'}
}
}) YIELD map as row RETURN row
","
CREATE (p:Picture) SET p = row
WITH p
MATCH (l:Listing)
WHERE p.listing = l.id
CREATE (p)-[:PICTURE_OF]->(l)
", {batchSize:10000, parallel:false, iterateList:true});
Specifications (Mandatory)
Currently used versions
Versions
- OS: macOS Catalina 10.15.3
- Neo4j: 4.0.0
- Neo4j-Apoc: 4.0.0.3