hetionet icon indicating copy to clipboard operation
hetionet copied to clipboard

Speeding up data import to Neo4j v5 and CSV format data

Open nickzren opened this issue 8 months ago • 2 comments

I encountered challenges while trying to load Hetionet data into my updated MacBook's Neo4j version 5.13. The existing Neo4j dumps were no longer compatible, and directly importing the data in JSON format was too time-consuming, taking an estimated 10+ hours.

To address this, I've written a script that efficiently converts JSON data to CSV format without any loss in node, edge, or property value information. The JSON-to-CSV conversion takes approximately 30 seconds, while uploading the CSV to Neo4j takes around 40 seconds.

I've organized each node and edge type into its own respective CSV file and accompanying Cypher script. I believe this will make it easier for people to understand and work with the data.

If this sounds useful, I'd be open to integrating these changes into the main branch. Let me know your thoughts.

You can find the revised code at: https://github.com/nickzren/hetionet/tree/csv

nickzren avatar Oct 27 '23 21:10 nickzren