gapbs icon indicating copy to clipboard operation
gapbs copied to clipboard

[converter] allow '#' as comment character in edge list parsing

Open heshpdx opened this issue 2 years ago • 1 comments

Many people download graphs from the Stanford SNAP database. These are edgelist files (.el) however they contain some comments as a header. Manually spicing out the first four lines of a 32GB file is painful, so here is a quick way to allow converter to work on the file without modification.

Sample test and output:

$ cat > foo.el
# Undirected graph: ../../data/output/friendster.txt
# Friendster
# Nodes: 65608366 Edges: 1806067135
# FromNodeId    ToNodeId
101     102
121     104
131     107
141     125
101     165
101     168
151     170
101     176
161     180
101     181
191     182
102     209
103     210
101     248
101     306
104     329
105     330
106     340
^D

$ ./converter -f foo.el -b foo.sg
# ignoring comment
# ignoring comment
# ignoring comment
# ignoring comment
Read Time:           0.00448
Build Time:          0.00343
Graph has 341 nodes and 18 directed edges for degree: 0

heshpdx avatar Mar 17 '22 00:03 heshpdx

It's been a few months. Just checking, is this worthy of merging to mainline? Thanks!

heshpdx avatar Aug 17 '22 15:08 heshpdx

Sorry for the delay!

Thank you for the PR!

Although SNAP is commonly used, this change adds complexity. Gapbs primitively uses file suffixes to identify file types, and we currently don't go near .txt (commonly used on SNAP) since it could mean so many things.

For cases like this, I recommend filtering out those comment lines: grep -v # WikiVote.txt > WikiVote.el

sbeamer avatar Nov 05 '22 03:11 sbeamer

That is unfortunate, since the example grep command would take a long time for large graphs. Instead, with the patch above the user could just run... mv WikiVoke.txt WikiVote.el ...and have it be ready to process.

heshpdx avatar Jan 26 '23 15:01 heshpdx

With pipes, the grep command takes no extra time. The file is still read once, and the process is still IO-bound.

sbeamer avatar Jan 26 '23 21:01 sbeamer