batch-import
batch-import copied to clipboard
edge list support?
Hi,
The current importer need numerical ID of nodes to work, which correspond to the line in the node list. But how to do when the data is just an edge list? e.g.:
Source, Target Michael, Selina Rana, Selma Michael, Selma
hi @sheymann you'll have to prepare your data a bit beforehand.
For example, you could:
- generate a list of all distinct node names (Michael, Selina, Rana, Selma...)
- assign them a sequential incremental ID, starting from 1
- write your nodes.csv file with the nodes in the same sequence order, similarly to
USERNAME
Michael
Selina
Rana
Selma
...
you can add additional columns/properties to your nodes if necessary
- write your relations.csv file with at least 3 columns: a source node, a target node, and relation type
The first 2 columns should reference the nodes using the sequential ID you chose before For the 3rd column, I'm assuming simple friendship
SOURCE TARGET RELTYPE
1 2 friend
3 4 friend
1 4 friend
...
Hope this helps
hm... looking at your profile @sheymann and your work on http://linkurio.us/
I guess your point was more about batch-import supporting only an edge-list as input than how to convert the input data (something you surely have all figured out)
Anyway, it may help others
Hey yes my question was focused on pure edge lists, as many complex networks datasets are encoded this way.
@redapple Thanks for chiming, in. I think it would make sense to also support edge-only csv data and also allowing to use indexable keys in the start/end columns. Just thought about using https://github.com/jankotek/MapDB as an in memory cache.
@sheymann Would you then just leave off the node file and assume that it is meant this way? This would also probably mean to support multiple relationship-files as for one file only one property-value mapping for nodes could then be realized.
Well, this is an extreme case where we only know the graph structure, and we don't care about node properties (we may have edge properties though) :) e.g. all of these datasets: http://snap.stanford.edu/data/
btw, I started a python helper module to export RDB data dumps into Neo4J https://github.com/redapple/sql2graph For now, it uses quite a lot of memory (when experimenting with MusicBrainz data)
I could write something similar to convert pure edge-lists into nodes.csv; rels.csv, index.csv... but it'd be in Python ;) and having that support directly in batch-import would be easier/cleaner