TopoBenchmark
TopoBenchmark copied to clipboard
TopoBenchmark is a Python library designed to standardize benchmarking and accelerate research in Topological Deep Learning
A Comprehensive Benchmark Suite for Topological Deep Learning
Assess how your model compares against state-of-the-art topological neural networks.
Overview • Get Started • Tutorials • Neural Networks • Liftings • Datasets • References
:pushpin: Overview
TopoBenchmarkX
(TBX) is a modular Python library designed to standardize benchmarking and accelerate research in Topological Deep Learning (TDL). In particular, TBX allows to train and compare the performances of all sorts of Topological Neural Networks (TNNs) across the different topological domains, where by topological domain we refer to a graph, a simplicial complex, a cellular complex, or a hypergraph. For detailed information, please refer to the TopoBenchmarkX: A Framework for Benchmarking Topological Deep Learning
paper.
The main pipeline trains and evaluates a wide range of state-of-the-art TNNs and Graph Neural Networks (GNNs) (see :gear: Neural Networks) on numerous and varied datasets and benchmark tasks (see :books: Datasets ).
Additionally, the library offers the ability to transform, i.e. lift, each dataset from one topological domain to another (see :rocket: Liftings), enabling for the first time an exhaustive inter-domain comparison of TNNs.
:jigsaw: Get Started
Create Environment
If you do not have conda on your machine, please follow their guide to install it.
First, clone the TopoBenchmarkX
repository and set up a conda environment tbx
with python 3.11.3.
git clone [email protected]:pyt-team/topobenchmarkx.git
cd TopoBenchmarkX
conda create -n tbx python=3.11.3
Next, check the CUDA version of your machine:
/usr/local/cuda/bin/nvcc --version
and ensure that it matches the CUDA version specified in the env_setup.sh
file (CUDA=cu121
by default). If it does not match, update env_setup.sh
accordingly by changing both the CUDA
and TORCH
environment variables to compatible values as specified on this website.
Next, set up the environment with the following command.
source env_setup.sh
This command installs the TopoBenchmarkX
library and its dependencies.
Run Training Pipeline
Next, train the neural networks by running the following command:
python -m topobenchmarkx
Thanks to hydra
implementation, one can easily override the default experiment configuration through the command line. For instance, the model and dataset can be selected as:
python -m topobenchmarkx model=cell/cwn dataset=graph/MUTAG
Remark: By default, our pipeline identifies the source and destination topological domains, and applies a default lifting between them if required.
The same CLI override mechanism also applies when modifying more finer configurations within a CONFIG GROUP
. Please, refer to the official hydra
documentation for further details.
:bike: Experiments Reproducibility
To reproduce Table 1 from the TopoBenchmarkX: A Framework for Benchmarking Topological Deep Learning
paper, please run the following command:
bash scripts/reproduce.sh
Remark: We have additionally provided a public W&B (Weights & Biases) project with logs for the corresponding runs (updated on June 11, 2024).
:anchor: Tutorials
Explore our tutorials for further details on how to add new datasets, transforms/liftings, and benchmark tasks.
:gear: Neural Networks
We list the neural networks trained and evaluated by TopoBenchmarkX
, organized by the topological domain over which they operate: graph, simplicial complex, cellular complex or hypergraph. Many of these neural networks were originally implemented in TopoModelX
.
Graphs
Model | Reference |
---|---|
GAT | Graph Attention Networks |
GIN | How Powerful are Graph Neural Networks? |
GCN | Semi-Supervised Classification with Graph Convolutional Networks |
Simplicial complexes
Cellular complexes
Model | Reference |
---|---|
CAN | Cell Attention Network |
CCCN | Inspired by A learning algorithm for computational connected cellular network, implementation adapted from Generalized Simplicial Attention Neural Networks |
CXN | Cell Complex Neural Networks |
CWN | Weisfeiler and Lehman Go Cellular: CW Networks |
Hypergraphs
:rocket: Liftings
We list the liftings used in TopoBenchmarkX
to transform datasets. Here, a lifting refers to a function that transforms a dataset defined on a topological domain (e.g., on a graph) into the same dataset but supported on a different topological domain (e.g., on a simplicial complex).
Topology Liftings
Graph2Simplicial
Name | Description | Reference |
---|---|---|
CliqueLifting | The algorithm finds the cliques in the graph and creates simplices. Given a clique the first simplex added is the one containing all the nodes of the clique, then the simplices composed of all the possible combinations with one node missing, then two nodes missing, and so on, until all the possible pairs are added. Then the method moves to the next clique. | Simplicial Complexes |
KHopLifting | For each node in the graph, take the set of its neighbors, up to k distance, and the node itself. These sets are then treated as simplices. The dimension of each simplex depends on the degree of the nodes. For example, a node with d neighbors forms a d-simplex. | Neighborhood Complexes |
Graph2Cell
Name | Description | Reference |
---|---|---|
CellCycleLifting | To lift a graph to a cell complex (CC) we proceed as follows. First, we identify a finite set of cycles (closed loops) within the graph. Second, each identified cycle in the graph is associated to a 2-cell, such that the boundary of the 2-cell is the cycle. The nodes and edges of the cell complex are inherited from the graph. | Appendix B |
Graph2Hypergraph
Name | Description | Reference |
---|---|---|
KHopLifting | For each node in the graph, the algorithm finds the set of nodes that are at most k connections away from the initial node. This set is then used to create an hyperedge. The process is repeated for all nodes in the graph. | Section 3.4 |
KNearestNeighborsLifting | For each node in the graph, the method finds the k nearest nodes by using the Euclidean distance between the vectors of features. The set of k nodes found is considered as an hyperedge. The proces is repeated for all nodes in the graph. | Section 3.1 |
Feature Liftings
Name | Description | Supported Domains |
---|---|---|
ProjectionSum | Projects r-cell features of a graph to r+1-cell structures utilizing incidence matrices (B_{r}). | Simplicial, Cell |
ConcatenationLifting | Concatenate r-cell features to obtain r+1-cell features. | Simplicial |
Dataset | Task | Description | Reference |
---|---|---|---|
Cora | Classification | Cocitation dataset. | Source |
Citeseer | Classification | Cocitation dataset. | Source |
Pubmed | Classification | Cocitation dataset. | Source |
MUTAG | Classification | Graph-level classification. | Source |
PROTEINS | Classification | Graph-level classification. | Source |
NCI1 | Classification | Graph-level classification. | Source |
NCI109 | Classification | Graph-level classification. | Source |
IMDB-BIN | Classification | Graph-level classification. | Source |
IMDB-MUL | Classification | Graph-level classification. | Source |
Classification | Graph-level classification. | Source | |
Amazon | Classification | Heterophilic dataset. | Source |
Minesweeper | Classification | Heterophilic dataset. | Source |
Empire | Classification | Heterophilic dataset. | Source |
Tolokers | Classification | Heterophilic dataset. | Source |
US-county-demos | Regression | In turn each node attribute is used as the target label. | Source |
ZINC | Regression | Graph-level regression. | Source |
:hammer_and_wrench: Development
To join the development of TopoBenchmarkX
, you should install the library in dev mode.
For this, you can create an environment using either conda or docker. Both options are detailed below.
:snake: Using Conda Environment
Follow the steps in :jigsaw: Get Started.
:whale: Using Docker
For ease of use, TopoBenchmarkX employs Docker. To set it up on your system you can follow their guide. once installed, please follow the next steps:
First, clone the repository and navigate to the correct folder.
git clone [email protected]:pyt-team/topobenchmarkx.git
cd TopoBenchmarkX
Then, build the Docker image.
docker build -t topobenchmarkx:new .
Depending if you want to use GPUs or not, these are the commands to run the Docker image and mount the current directory.
With GPUs
docker run -it -d --gpus all --volume $(pwd):/TopoBenchmarkX topobenchmarkx:new
With CPU
docker run -it -d --volume $(pwd):/TopoBenchmarkX topobenchmarkx:new
Happy development!
:mag: References
To learn more about TopoBenchmarkX
, we invite you to read the paper:
@misc{topobenchmarkx2024,
title={TopoBenchmarkX},
author={PyT-Team},
year={2024},
eprint={TBD},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
If you find TopoBenchmarkX
useful, we would appreciate if you cite us!
:mouse: Additional Details
Hierarchy of configuration files
├── configs <- Hydra configs
│ ├── callbacks <- Callbacks configs
│ ├── dataset <- Dataset configs
│ │ ├── graph <- Graph dataset configs
│ │ ├── hypergraph <- Hypergraph dataset configs
│ │ └── simplicial <- Simplicial dataset configs
│ ├── debug <- Debugging configs
│ ├── evaluator <- Evaluator configs
│ ├── experiment <- Experiment configs
│ ├── extras <- Extra utilities configs
│ ├── hparams_search <- Hyperparameter search configs
│ ├── hydra <- Hydra configs
│ ├── local <- Local configs
│ ├── logger <- Logger configs
│ ├── loss <- Loss function configs
│ ├── model <- Model configs
│ │ ├── cell <- Cell model configs
│ │ ├── graph <- Graph model configs
│ │ ├── hypergraph <- Hypergraph model configs
│ │ └── simplicial <- Simplicial model configs
│ ├── optimizer <- Optimizer configs
│ ├── paths <- Project paths configs
│ ├── scheduler <- Scheduler configs
│ ├── trainer <- Trainer configs
│ ├── transforms <- Data transformation configs
│ │ ├── data_manipulations <- Data manipulation transforms
│ │ ├── dataset_defaults <- Default dataset transforms
│ │ ├── feature_liftings <- Feature lifting transforms
│ │ └── liftings <- Lifting transforms
│ │ ├── graph2cell <- Graph to cell lifting transforms
│ │ ├── graph2hypergraph <- Graph to hypergraph lifting transforms
│ │ ├── graph2simplicial <- Graph to simplicial lifting transforms
│ │ ├── graph2cell_default.yaml <- Default graph to cell lifting config
│ │ ├── graph2hypergraph_default.yaml <- Default graph to hypergraph lifting config
│ │ ├── graph2simplicial_default.yaml <- Default graph to simplicial lifting config
│ │ ├── no_lifting.yaml <- No lifting config
│ │ ├── custom_example.yaml <- Custom example transform config
│ │ └── no_transform.yaml <- No transform config
│ ├── wandb_sweep <- Weights & Biases sweep configs
│ │
│ ├── __init__.py <- Init file for configs module
│ └── run.yaml <- Main config for training
More information regarding Topological Deep Learning
Topological Graph Signal Compression
Architectures of Topological Deep Learning: A Survey on Topological Neural Networks
TopoX: a suite of Python packages for machine learning on topological domains