apoc icon indicating copy to clipboard operation
apoc copied to clipboard

No results being returned by `apoc.export.json.all` when `writeNodeProperties: true` for a large dataset when writing to a stream

Open elan-sfrancies opened this issue 1 year ago • 3 comments

I am experiencing issues when running the following query against an on-premise docker instance of neo4j Community Edition.

CALL apoc.export.json.all(null, {stream:true, jsonFormat: "JSON_LINES", writeNodeProperties: true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data

Expected Behavior

The query returns results or displays an error showing why results could not be returned.

Actual Behavior

No results are returned (the following message is seen in the web interface):

(no changes, no records)

How to Reproduce the Problem

Steps

  1. Generate a neo4j instance with ~100,000 nodes, ~100,000 relationships and ~17,000,000 properties
  2. Run the following query:
CALL apoc.export.json.all(null, {stream:true, jsonFormat: "JSON_LINES", writeNodeProperties: true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data
  1. Observe that results are not returned I have tested this behaviour using the .Net Driver as well as the browser interface.

Screenshots

The results when writeNodeProperties: false:

WriteNodePropertiesFalse

The lack of results when writeNodeProperties: true:

WriteNodePropertiesTrue

Specifications

Memory: 30GB (it appears that the memory use increases during the query before topping out at around 9.5GB and then leveling off.) CPU: 20

Versions

  • OS: Docker for Windows (WSL) on Windows 10
  • Neo4j: neo4j:5.20-community-bullseye
  • Neo4j-Apoc: "NEO4J_PLUGINS=["apoc"]" (latest)

elan-sfrancies avatar Jul 22 '24 09:07 elan-sfrancies

Hey! Thanks for writing in, I suspect this is an OOM as APOC does not implement memory tracking. Are you able to check out the debug.log file and see if there are errors there? If so, can you send that here too?

Unfortunately, with how APOC is implemented, this isn't something easy for us to fix at this time. My suggestion would be to use one of the other export.json procedures in which you can feed the data into it using Cypher, then you can control how much data is getting consumed at a given time.

gem-neo4j avatar Jul 22 '24 11:07 gem-neo4j

I'm considering adding an option to truly stream the json lines, and not put it in a single massive string.

loveleif avatar May 06 '25 07:05 loveleif

...actually, it's a bug because the batchSize configuration is already documented.

loveleif avatar May 06 '25 07:05 loveleif