batch-import edge list support?

Hi,

The current importer need numerical ID of nodes to work, which correspond to the line in the node list. But how to do when the data is just an edge list? e.g.:

Source, Target Michael, Selina Rana, Selma Michael, Selma

Apr 15 '13 16:04 sheymann

hi @sheymann you'll have to prepare your data a bit beforehand.

For example, you could:

generate a list of all distinct node names (Michael, Selina, Rana, Selma...)
assign them a sequential incremental ID, starting from 1
write your nodes.csv file with the nodes in the same sequence order, similarly to

USERNAME
Michael
Selina
Rana
Selma
...

you can add additional columns/properties to your nodes if necessary

write your relations.csv file with at least 3 columns: a source node, a target node, and relation type

The first 2 columns should reference the nodes using the sequential ID you chose before For the 3rd column, I'm assuming simple friendship

SOURCE  TARGET  RELTYPE
1   2   friend
3   4   friend
1   4   friend
...

Hope this helps

May 21 '13 21:05 redapple

hm... looking at your profile @sheymann and your work on http://linkurio.us/ I guess your point was more about batch-import supporting only an edge-list as input than how to convert the input data (something you surely have all figured out) Anyway, it may help others

May 21 '13 22:05 redapple

Hey yes my question was focused on pure edge lists, as many complex networks datasets are encoded this way.

May 21 '13 22:05 sheymann

@redapple Thanks for chiming, in. I think it would make sense to also support edge-only csv data and also allowing to use indexable keys in the start/end columns. Just thought about using https://github.com/jankotek/MapDB as an in memory cache.

May 21 '13 22:05 jexp

@sheymann Would you then just leave off the node file and assume that it is meant this way? This would also probably mean to support multiple relationship-files as for one file only one property-value mapping for nodes could then be realized.

May 21 '13 22:05 jexp

Well, this is an extreme case where we only know the graph structure, and we don't care about node properties (we may have edge properties though) :) e.g. all of these datasets: http://snap.stanford.edu/data/

May 21 '13 22:05 sheymann

btw, I started a python helper module to export RDB data dumps into Neo4J https://github.com/redapple/sql2graph For now, it uses quite a lot of memory (when experimenting with MusicBrainz data)

I could write something similar to convert pure edge-lists into nodes.csv; rels.csv, index.csv... but it'd be in Python ;) and having that support directly in batch-import would be easier/cleaner

May 21 '13 22:05 redapple

batch-import batch-import copied to clipboard

edge list support?

batch-import
batch-import copied to clipboard