sql2graph icon indicating copy to clipboard operation
sql2graph copied to clipboard

Adjust the batchimport to the new features

Open peterneubauer opened this issue 11 years ago • 2 comments

Hi there, I imported the musicbrainz database to Neo4j using the following approach, helped by @jexp:

Define 2 indexes (one mbid exact, for MBIDs and one mb fulltext, for everything else) in batch.properties:

batch_import.keep_db=false
batch_import.mapdb_cache.disable=true
batch_import.node_index.mb=fulltext
batch_import.node_index.mbid=exact
batch_import.csv.quotes=false
cache_type=none
use_memory_mapped_buffers=true
neostore.nodestore.db.mapped_memory=300M
neostore.relationshipstore.db.mapped_memory=3G
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=500M
neostore.propertystore.db.arrays.mapped_memory=0M
neostore.propertystore.db.index.keys.mapped_memory=15M
neostore.propertystore.db.index.mapped_memory=15M

Then, create the indexing instructions directly in the node.csv and rels.csv files, so we don't need the ...index.csv files anymore, see https://github.com/jexp/batch-import -> automatic indexing

kind:string:mb  comment status  position    name:string:mb  area    gender  format  barcode number  ended   length  end_date_year   begin_date_year mbid:string:mbid    type:string:mb  pk
artist              Talkshow Boy                        f               e8d94cf5-fafa-48fc-a6fa-aa50cf54d7f3        288762
artist              Vibulator                       f               735bfaad-6eb1-4f9c-b21d-cbaef7c79a92        97944
artist              Eat Me                      f               c38a93e8-2ecf-4848-b1d2-364202d9dc0c    Group   499198
artist              Uffe Andersen                       f               a7f3c871-3ba3-40b1-ba58-d08b40312789    Person  514886
artist              Headust                     f               eda60727-7036-437b-b53d-ae472818ee3a        212148
artist              Sons Of The Subway                      f               232d5716-c2b2-47e1-aa0c-264ec69e6a18        100774
artist              The Poe Boy Family                      f               672d599e-6a6c-456e-98ba-dac5a45e3ed8        43132
artist              Ralph Gusovius  Germany Male                f           1950    6ecfcea1-677d-427b-a38b-9c76ce92e313    Person  295052
artist              Elastik Band                        f               46e0639c-1ccf-45f5-b886-4cbf5549a2a1        61467

And then import the two files with something like

java -Xmx10G -server -Dfile.encoding=UTF-8 -jar ~/neo/batch-import/target/batch-import-jar-with-dependencies.jar ./graph.db nodes.csv rels.csv 

WDYT? It would make the output a lot easier, and the import took about 10min on my machine, 160M Properties, 75M relatoinships ...

peterneubauer avatar Aug 17 '13 18:08 peterneubauer