incubator-graphar icon indicating copy to clipboard operation
incubator-graphar copied to clipboard

[Discussion] Gathering graph dataset to construct a data hub with GraphAr format

Open acezen opened this issue 9 months ago • 6 comments

For improving the ability of GraphAr format, we prepare to construct a data hub with GraphAr format.

This issue is for gathering graph dataset, which is best to meet the following requirements:

  • not large-scale: can be loaded into memory with single machine.
  • task diversity: Graph Analysis, GNN, Knowledge graph etc

Any comments, questions, and dataset suggestions are welcome!

acezen avatar Nov 07 '23 04:11 acezen

LDBC SNB/BI dataset

acezen avatar Nov 09 '23 01:11 acezen

What about Stanford Graph Dataset? There are a lot of network of different types from Kb to Gb. And they are already splitted into tasks, like community detection, graph classification, etc.

SemyonSinchenko avatar Dec 30 '23 10:12 SemyonSinchenko

What about Stanford Graph Dataset? There are a lot of network of different types from Kb to Gb. And they are already splitted into tasks, like community detection, graph classification, etc.

That would be a great data source for GraphAr, thanks for the proposal!

acezen avatar Jan 08 '24 01:01 acezen

We could consider utilizing the following graph datasets for our proposal:

  1. Property Graphs: The LDBC graphs feature a variety of vertex and edge types, each with associated properties that encompass diverse data types. These graphs can be generated at various scales to accommodate different analysis needs.

  2. Simple Topological Graphs: The SNAP datasets offer a collection of real-world graphs from multiple domains, including social networks, web graphs, and road networks, among others. Additionally, the Laboratory for Web Algorithmics provides a range of large-scale web graphs compressed using LLP + WebGraph.These can be particularly useful for evaluating the storage efficiency of GraphAr.

  3. Labeled Property Graphs: The neo4j-graph-examples repository contains graphs in Neo4j dump format, characterized by the inclusion of vertex labels. Each vertex in these graphs may have multiple associated labels, adding complexity to the graph properties.

  4. GNN Graphs The OGBN graphs are tailored for node property prediction tasks, with the predicted labels being represented as vertex labels. These graphs are well-suited for representing GNN (Graph Neural Networks) graph structures.

Subsequent considerations may encompass the use of RDF (Resource Description Framework) datasets, temporal graphs, and knowledge graphs.

lixueclaire avatar Jan 31 '24 02:01 lixueclaire

@acezen, do you have any more comments on this proposal?

lixueclaire avatar Jan 31 '24 02:01 lixueclaire

@acezen, do you have any more comments on this proposal?

Looks good to me

acezen avatar Jan 31 '24 03:01 acezen