Graphormer
Graphormer copied to clipboard
can not reproduce PCQM4M-LSC results
Hi there,
I tried to reproduce the results of Graphormer_SMALL on the PCQM4M-LSC dataset using the v1.0 branch.
[ -z "${exp_name}" ] && exp_name="pcq"
[ -z "${seed}" ] && seed="1"
[ -z "${arch}" ] && arch="--ffn_dim 512 --hidden_dim 512 --weight_decay 0.0 --intput_dropout_rate 0.0 --dropout_rate 0.1 --n_layers 6 --peak_lr 3e-4 --edge_type multi_hop --multi_hop_max_dist 5"
[ -z "${batch_size}" ] && batch_size="256"
echo -e "\n\n"
echo "=====================================ARGS======================================"
echo "arg0: $0"
echo "exp_name: ${exp_name}"
echo "arch: ${arch}"
echo "seed: ${seed}"
echo "batch_size: ${batch_size}"
echo "==============================================================================="
default_root_dir="../../exps/pcq/$exp_name/$seed"
mkdir -p $default_root_dir
n_gpu=$(nvidia-smi -L | wc -l)
python ../../graphormer/entry.py --num_workers 8 --seed $seed --batch_size $batch_size \
--dataset_name PCQM4M-LSC \
--gpus $n_gpu --accelerator ddp --precision 16 --gradient_clip_val 5.0 \
$arch \
--default_root_dir $default_root_dir
My machine also has 8 GPUs. In this case, is the equivalent batch size 256 * 8 = 2048? After training, according to Tensorboard curves, the valid_mae is about 0.135, while it is 0.1264 for Graphormer_SMALL as reported in the paper. Do you have any ideas about the performance gap? It is suggested to set batch size 1024, so should I change the batch size to 128 (1024/8) for an 8-GPU machine?
Do you run baseline methods in Table 1 to report their performance?
I will be very appreciated if you can answer these questions!
@chaoyan1037 Hi, how do you reproduce the results of Graphormer_SMALL on the PCQM4M-LSC dataset using the v1.0 branch?
I create the environment follow https://github.com/microsoft/Graphormer/blob/v1.0/README.md
:
# create a new environment
conda create --name graphormer python=3.7
conda activate graphormer
# install requirements
pip install rdkit-pypi cython
pip install ogb==1.3.1 pytorch-lightning==1.3.0
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-geometric==1.6.3 ogb==1.3.1 pytorch-lightning==1.3.1 tqdm torch-sparse==0.6.9 torch-scatter==2.0.6 -f https://pytorch-geometric.com/whl/torch-1.7.0+cu110.html
Then I run the command:
[ -z "${exp_name}" ] && exp_name="pcq"
[ -z "${seed}" ] && seed="1"
[ -z "${arch}" ] && arch="--ffn_dim 768 --hidden_dim 768 --dropout_rate 0.1 --n_layers 12 --peak_lr 2e-4 --edge_type multi_hop --multi_hop_max_dist 5"
[ -z "${batch_size}" ] && batch_size="256"
echo -e "\n\n"
echo "=====================================ARGS======================================"
echo "arg0: $0"
echo "exp_name: ${exp_name}"
echo "arch: ${arch}"
echo "seed: ${seed}"
echo "batch_size: ${batch_size}"
echo "==============================================================================="
default_root_dir="../../exps/pcq/$exp_name/$seed"
mkdir -p $default_root_dir
n_gpu=$(nvidia-smi -L | wc -l)
python ../../graphormer/entry.py --num_workers 8 --seed $seed --batch_size $batch_size \
--dataset_name PCQM4M-LSC \
--gpus $n_gpu --accelerator ddp --precision 16 --gradient_clip_val 5.0 \
$arch \
--default_root_dir $default_root_dir
There is an error:
Traceback (most recent call last):
File "../../graphormer/entry.py", line 4, in <module>
from model import Graphormer
File "/home/linjiayi/Graphormer/graphormer/model.py", line 4, in <module>
from data import get_dataset
File "/home/linjiayi/Graphormer/graphormer/data.py", line 5, in <module>
from wrapper import MyGraphPropPredDataset, MyPygPCQM4MDataset, MyZINCDataset
File "/home/linjiayi/Graphormer/graphormer/wrapper.py", line 6, in <module>
import torch_geometric.datasets
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/site-packages/torch_geometric/__init__.py", line 2, in <module>
import torch_geometric.nn
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/site-packages/torch_geometric/nn/__init__.py", line 2, in <module>
from .data_parallel import DataParallel
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/site-packages/torch_geometric/nn/data_parallel.py", line 5, in <module>
from torch_geometric.data import Batch
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/site-packages/torch_geometric/data/__init__.py", line 1, in <module>
from .data import Data
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/site-packages/torch_geometric/data/data.py", line 8, in <module>
from torch_sparse import coalesce, SparseTensor
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/site-packages/torch_sparse/__init__.py", line 15, in <module>
f'{library}_{suffix}', [osp.dirname(__file__)]).origin)
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/site-packages/torch/_ops.py", line 105, in load_library
ctypes.CDLL(path)
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcusparse.so.11: cannot open shared object file: No such file or directory
How to solve it?
Hi @skye95git,
I do not have this problem. I guess it is related to your CUDA installation. You may find a similar issue here.
Hi @skye95git,
I do not have this problem. I guess it is related to your CUDA installation. You may find a similar issue here.
Thanks for your reply! I have solved it. The cause was a torch-geometric installation error.
Hi @skye95git, I do not have this problem. I guess it is related to your CUDA installation. You may find a similar issue here.
Thanks for your reply! I have solved it. The cause was a torch-geometric installation error.
I meet the same problem. Can you tell me how you solve it?
Hi @skye95git, I do not have this problem. I guess it is related to your CUDA installation. You may find a similar issue here.
Thanks for your reply! I have solved it. The cause was a torch-geometric installation error.
I meet the same problem. Can you tell me how you solve it?
I reinstalled torch-geometric.
conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=10.2 -c pytorch
wget https://data.pyg.org/whl/torch-1.9.0%2Bcu102/torch_cluster-1.5.9-cp39-cp39-linux_x86_64.whl
wget https://data.pyg.org/whl/torch-1.9.0%2Bcu102/torch_scatter-2.0.7-cp39-cp39-linux_x86_64.whl
wget https://data.pyg.org/whl/torch-1.9.0%2Bcu102/torch_sparse-0.6.11-cp39-cp39-linux_x86_64.whl
wget https://data.pyg.org/whl/torch-1.9.0%2Bcu102/torch_spline_conv-1.2.1-cp39-cp39-linux_x86_64.whl
pip install torch_cluster-1.5.9-cp39-cp39-linux_x86_64.whl
pip install torch_scatter-2.0.7-cp39-cp39-linux_x86_64.whl
pip install torch_sparse-0.6.11-cp39-cp39-linux_x86_64.whl
pip install torch_spline_conv-1.2.1-cp39-cp39-linux_x86_64.whl
pip install torch-geometric
update: follow it to install torch-geometric
:https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
Hi @skye95git, I do not have this problem. I guess it is related to your CUDA installation. You may find a similar issue here.
Thanks for your reply! I have solved it. The cause was a torch-geometric installation error.
I meet the same problem. Can you tell me how you solve it?
I reinstalled torch-geometric.
conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=10.2 -c pytorch wget https://data.pyg.org/whl/torch-1.9.0%2Bcu102/torch_cluster-1.5.9-cp39-cp39-linux_x86_64.whl wget https://data.pyg.org/whl/torch-1.9.0%2Bcu102/torch_scatter-2.0.7-cp39-cp39-linux_x86_64.whl wget https://data.pyg.org/whl/torch-1.9.0%2Bcu102/torch_sparse-0.6.11-cp39-cp39-linux_x86_64.whl wget https://data.pyg.org/whl/torch-1.9.0%2Bcu102/torch_spline_conv-1.2.1-cp39-cp39-linux_x86_64.whl pip install torch_cluster-1.5.9-cp39-cp39-linux_x86_64.whl pip install torch_scatter-2.0.7-cp39-cp39-linux_x86_64.whl pip install torch_sparse-0.6.11-cp39-cp39-linux_x86_64.whl pip install torch_spline_conv-1.2.1-cp39-cp39-linux_x86_64.whl pip install torch-geometric
update: follow it to install
torch-geometric
:https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
Thanks a lot! I've run the example.
Thanks for using Graphormer and sorry for late response. Does the issue still be there? If yes, please kindly try the well-prepared scripts for training PCQ using V2, which should be easily reproduced for expected results. link
Thanks for your response! Do you refer to V2 dataset or V2 code?
I still can not reproduce results with V1 code. I find the V1 code is easier to understand. Is there a script for the V1 code?
Yes, the script for V1 code could be find there: https://github.com/microsoft/Graphormer/tree/v1.0/examples/ogb-lsc#example-usage, but V2 code is more easier to use. Please feel free to have a try.
Thank you very much for your quick response!
I used this script for the V1 code but can not reproduce the results. That is exactly my original question.
That is the script for Graphormer_Base, will you try to run for Base model? If the problem still exist, we will look into it, in the meanwhile, V2 code with scripts is highly recommended, with higher performance, faster training speed, and robust training process.
Yes, I am trying to train the Graphormer on PCQM4M-LSC from scratch. Thanks for the reminder! Since I have adapted the V1 for my own research, I prefer to use V1. Later may switch to V2 for these benefits.