GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs

Updates

[x] [2024.11.01] We have uploaded the source datasets.
[x] [2024.11.03] We have uploaded the target datasets and the pretrained checkpoint.
[ ] How to apply GraphCLIP on customized datasets.

1. Environment setup

conda create -n graphclip python=3.10
conda activate graphclip
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
pip install torch_geometric

2. Datasets

For source data

This repository includes the smallest source dataset, i.e., pubmed. For larger-scale source datasets, please download generated graph summaries:

Datasets	Links
OGBN-ArXiv	Google Drive
ArXiv_2023	Google Drive
Reddit	Google Drive
OGBN-Products	Google Drive

Once downloaded, unzip the files and place them in the summary directory.
For convenience, we also provide the processed data, which includes the graph structure and node features. Please download them following:

Datasets	Links
OGBN-ArXiv	Google Drive
ArXiv_2023	Google Drive
Reddit	Google Drive
OGBN-Products	Google Drive

Once downloaded, unzip the files and place them in the processed_data directory.

For target data

For target datasets, we only need to download processed data, unzip them and put them into processed_data directory.

Datasets	Links
WikiCS	Google Drive
Instagram	Google Drive
Ele-Photo	Google Drive
Ele-Computers	Google Drive
Books-History	Google Drive

As for subgraphs, please run bash gen_target_subg.sh to generate subgraphs for each target dataset.

3. Pretraining on source data

Downloading and loading our pretrained checkpoint

To get started, download our released checkpoint and unzip the content. Place the extracted files into the checkpoints directory. You can then use this checkpoint directly on your target datasets, as outlined in the next section.

Or training from scratch

# We provide the smallest source data (pubmed) for running our codes
# single gpu
CUDA_VISIBLE_DEVICES=0 python train.py --source_data pubmed --batch_size 1024 --epochs 30
# multiple gpus
CUDA_VISIBLE_DEVICES=0,1 python train.py --source_data pubmed --batch_size 1024 --epochs 30
# reproduce our results
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py --source_data ogbn-arxiv+arxiv_2023+pubmed+ogbn-products+reddit --batch_size 7200 --epochs 30

We use 8 A100(40G) GPUs for pretraining with ~7 hours

This code supports Data Parallel, you can assign multiple gpus here.

4. Zero-shot learning on target data

We provide a sample target dataset (citeseer) for running our code. By default, this will load your pretrained checkpoint.

CUDA_VISIBLE_DEVICES=0 python eval.py --target_data citeseer

more target datasets can be evaluated: --target_data cora+citeseer+wikics+instagram+photo+computer+history

To reproduce our experiments, use the --ckpt flag to specify the pretrained checkpoint, and provide the name of the downloaded checkpoint.

CUDA_VISIBLE_DEVICES=0 python eval.py --target_data cora+citeseer+wikics+instagram+photo+computer+history --ckpt pretrained_graphclip

(a) Generate graph summaries locally

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python generate_summary.py --dataset arxiv_2023 --walk_step 128 --restart 0.8

GraphCLIP
GraphCLIP copied to clipboard

Metadata

GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs

Updates

1. Environment setup

2. Datasets

For source data

For target data

3. Pretraining on source data

Downloading and loading our pretrained checkpoint

Or training from scratch

4. Zero-shot learning on target data

(a) Generate graph summaries locally

← Metadata

Owner

Metadata

GraphCLIP GraphCLIP copied to clipboard

Metadata

GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs

Updates

1. Environment setup

2. Datasets

For source data

For target data

3. Pretraining on source data

Downloading and loading our pretrained checkpoint

Or training from scratch

4. Zero-shot learning on target data

(a) Generate graph summaries locally

← Metadata

Owner

Metadata

GraphCLIP
GraphCLIP copied to clipboard