Total-variation-graph-neural-networks
Total-variation-graph-neural-networks copied to clipboard
Pytorch and Tensorflow implementation of TVGNN, presented at ICML 2023.
Tensorflow and Pytorch implementation of the Total Variation Graph Neural Network (TVGNN) as presented in the original paper.
The TVGNN model can be used to cluster the vertices of an annotated graph, by accounting both for the graph topology and the vertex features. Compared to other GNNs for clustering, TVGNN creates sharp cluster assignments that better approximate the optimal (in the minimum cut sense) partition.

The TVGNN model can also be used to implement graph pooling in a deep GNN architecture for tasks such as graph classification.
Model description
TVGNN consists of the GTVConv layer and the AsymmetricCheegerCut layer.
GTVConv
The GTVConv layer is a message-passing layer that minimizes the $L_1$-norm of the difference between features of adjacent nodes. The $l$-th GTVConv layer updates the node features as
$$\mathbf{X}^{(l+1)} = \sigma\left[ \left( \mathbf{I} - 2\delta \mathbf{L}_\Gamma^{(l)} \right) \mathbf{X}^{(l)}\mathbf{\Theta} \right] $$
where $\sigma$ is a non-lineary, $\mathbf{\Theta}$ are the trainable weights of the layer, and $\delta$ is an hyperparameter. $\mathbf{L}^{(l)}_ \Gamma$ is a Laplacian defined as $\mathbf{L}^{(l)}_ \Gamma$ = $\mathbf{D}^{(l)}_ \Gamma - \mathbf{\Gamma}^{(l)}$, where $\mathbf{D}_\Gamma = \text{diag}(\mathbf{\Gamma} \boldsymbol{1})$ and
$$ [\mathbf{\Gamma}]^{(l)}_ {ij} = \frac{a_ {ij}}{\texttt{max}{ \lVert \boldsymbol{x}_i^{(l)} - \boldsymbol{x}_j^{(l)} \rVert_1, \epsilon }}$$
where $a_{ij}$ is the $ij$-th entry of the adjacency matrix, $\boldsymbol{x}_i^{(l)}$ is the feature of vertex $i$ at layer $l$ and $\epsilon$ is a small constant that avoids zero-division.
AsymCheegerCut
The AsymCheegerCut is a graph pooling layer that internally contains an $\texttt{MLP}$ parametrized by $\mathbf{\Theta}_\text{MLP}$ and that computes:
- a cluster assignment matrix $\mathbf{S} = \texttt{Softmax}(\texttt{MLP}(\mathbf{X}; \mathbf{\Theta}_\text{MLP})) \in \mathbb{R}^{N\times K}$, which maps the $N$ vertices in $K$ clusters,
- an unsupervised loss $\mathcal{L} = \alpha_1 \mathcal{L}_ \text{GTV} + \alpha_2 \mathcal{L}_ \text{AN}$, where $\alpha_1$ and $\alpha_2$ are two hyperparameters,
- the adjacency matrix and the vertex features of a coarsened graph
$$\mathbf{A}^\text{pool} = \mathbf{S}^T \tilde{\mathbf{A}} \mathbf{S} \in\mathbb{R}^{K\times K}; \ \mathbf{X}^\text{pool}=\mathbf{S}^T\mathbf{X} \in\mathbb{R}^{K\times F}. $$
The term $\mathcal{L}_ \text{GTV}$ in the loss minimizes the graph total variation of the cluster assignments $\mathbf{S}$ and is defined as
$$\mathcal{L}_ \text{GTV} = \frac{\mathcal{L}_ \text{GTV}^*}{2E} \in [0, 1],$$
where $\mathcal{L}_ \text{GTV}^*$ = $\displaystyle\sum_{k=1}^K\sum_{i=1}^N \sum_{j=i}^N a_{i,j} |s_{i,k} - s_{j,k}|$, $s_{i,k}$ is the assignment of vertex $i$ to cluster $k$ and $E$ is the number of edges.
The term $\mathcal{L}_\text{AN}$ encourages the partition to be balanced and is defined as
$$\mathcal{L}_ {\text{AN}} = \frac{\beta - \mathcal{L}^*_ \text{AN}}{\beta} \in [0, 1],$$
where $\mathcal{L}_ \text{AN}^* = \displaystyle\sum^K_{k=1} ||\boldsymbol{s}_ {:,k}$ - $\text{quant}_ \rho (\boldsymbol{s}_ {:,k})||_ {1, \rho}$. When $\rho = K-1$, then $\beta = N\rho$. When $\rho$ takes different values, then $\beta = N\rho\min(1, K/(\rho+1))$. $\text{quant}_ \rho(\boldsymbol{s}_ k)$ denotes the $\rho$-quantile of $\boldsymbol{s}_ k$ and $||\cdot||_ {1,\rho}$ denotes an asymmetric $\ell_1$ norm, which for a vector $\boldsymbol{x} \in \mathbb{R}^{N\times 1}$ is $||\boldsymbol{x}||_ {1,\rho}$ = $\displaystyle\sum^N_{i=1} |x_{i}|_ \rho$, where $|x_i|_ \rho = \rho x_i$ if $x_i\geq 0$ and $|x_i|_ \rho = -x_i$ if $x_i < 0$.
Downstream tasks
We use TVGNN to perform vertex clustering and graph classification. Other tasks such as graph regression could be considered as well.
Vertex clustering
This is an unsupervised task, where the goal is to generate a partition of the vertices based on the similarity of their vertex features and the graph topology. The GNN model is trained only by minimizing the unsupervised loss $\mathcal{L}$.

Graph classification
This is a supervised with goal of predicting the class of each graph. The GNN rchitectures for graph classification alternates GTVConv layers with a graph pooling layer, which gradually distill the global label information from the vertex representations. The GNN is trained by minimizing the unsupervised loss $\mathcal{L}$ for each pooling layer and a supervised cross-entropy loss $\mathcal{L}_\text{cross-entr}$ between the true and predicted class label.

Implementation
Tensorflow
This implementation is based on the Spektral library and follows the Select-Reduce-Connect API. To execute the code, first install the conda environment from tf_environment.yml
conda env create -f tf_environment.yml
The tensorflow/
folder includes:
- The implementation of the GTVConv layer
- The implementation of the AsymCheegerCutPool layer
- An example script to perform the clustering task
- An example script to perform the classification task
Pytorch
This implementation is based on the Pytorch Geometric library. To execute the code, first install the conda environment from pytorch_environment.yml
conda env create -f pytorch_environment.yml
The pytorch/
folder includes:
- The implementation of the GTVConv layer
- The implementation of the AsymCheegerCutPool layer
- An example script to perform the clustering task
- An example script to perform the classification task
Spektral
TVGNN is now available on Spektral:
- GTVConv layer,
- AsymCheegerCutPool layer,
- Example script to perform node clustering with TVGNN.
Citation
If you use TVGNN in your research, please consider citing our work as
@inproceedings{hansen2023tvgnn,
title={Total Variation Graph Neural Networks},
author={Hansen, Jonas Berg and Bianchi, Filippo Maria},
booktitle={Proceedings of the 40th international conference on Machine learning},
pages={},
year={2023},
organization={ACM}
}