pytorch_geometric Add GraphLand benchmark

GraphLand is a new graph benchmark for node property prediction that covers diverse industrial applications and includes graphs with different sizes, structural characteristics, and feature sets.

Sep 16 '25 15:09 gvbazhenov

Could someone help me to fix the problems with the imports of pandas, sklearn and yaml that are required in the implemented class? I do not understand how to organize them so that the tests pass.

Sep 16 '25 15:09 gvbazhenov

Could someone help me to fix the problems with the imports of pandas, sklearn and yaml that are required in the implemented class? I do not understand how to organize them so that the tests pass.

Alright, moving imports into function bodies has solved our problems. However, it seems like yaml is not installed for testing. How we can fix that and use yaml in our implementation?

Sep 17 '25 11:09 gvbazhenov

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 85.09%. Comparing base (c211214) to head (8ef967d). :warning: Report is 139 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10458      +/-   ##
==========================================
- Coverage   86.11%   85.09%   -1.03%     
==========================================
  Files         496      510      +14     
  Lines       33655    35964    +2309     
==========================================
+ Hits        28981    30602    +1621     
- Misses       4674     5362     +688

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sep 17 '25 11:09 codecov[bot]

There are also some linter issues with types, but they occur not only in torch_geometric/datasets/graphland.py, as I understand. Can we skip them if all other tests pass?

Sep 17 '25 11:09 gvbazhenov

@rusty1s @akihironitta @wsad1 I wanted to kindly ping regarding this PR and share an update: GraphLand benchmark has been accepted to NeurIPS this year, which I believe highlights its potential value for the community. I would greatly appreciate it if you could find some time to review the changes — I think merging this would be very timely and beneficial for many users. Thanks for your help!

Oct 01 '25 15:10 gvbazhenov

@puririshi98 Thanks for picking up our PR! I have managed to fix the linter issues and also added an example on using GraphLand datasets for node property prediction. Hope this can be merged now.

Oct 06 '25 19:10 gvbazhenov

For those runs, please use the latest nvidia pyg container from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pyg

can you share a log of running the example?

Oct 06 '25 20:10 puririshi98

@puririshi98 Here are the commands I have executed staying at the root of pytorch_geometric repository:

> docker run --gpus all -it --network=host --rm --mount type=bind,source=(pwd),target=/workspace nvcr.io/nvidia/pyg:25.09-py3 bash
> pip uninstall torch-geometric
... Successfully uninstalled torch-geometric-2.7.0
> pip install .
... Successfully installed torch-geometric-2.7.0
> cd examples
> python graphland.py --name tolokers-2 --split RL
Extracting datasets/tolokers-2/raw/tolokers-2.zip
Processing...
Done!
100%|████████████████████████████████████████| 100/100 [00:03<00:00, 28.84it/s, loss=0.4327, train=49.89, val=43.11, test=44.79]
Best metrics: train=49.78, val=43.11, test=44.76
> python graphland.py --name avazu-ctr --split THI
Extracting datasets/avazu-ctr/raw/avazu-ctr.zip
Processing...
Done!
100%|████████████████████████████████████████| 100/100 [00:27<00:00,  3.63it/s, loss=0.7874, train=21.32, val=15.32, test=27.06]
Best metrics: train=21.32, val=15.32, test=27.06
> rm -rf datasets
> exit

And the logs of pytest:

> pytest test/datasets/test_graphland.py
============================================================== test session starts ===============================================================
platform linux -- Python 3.10.18, pytest-8.4.2, pluggy-1.6.0 -- .../bin/python3.10
cachedir: .pytest_cache
rootdir: ...
configfile: pyproject.toml
plugins: xdist-3.8.0
collected 6 items                                                                                                                                

test/datasets/test_graphland.py::test_transductive_graphland[hm-categories] PASSED
test/datasets/test_graphland.py::test_transductive_graphland[tolokers-2] PASSED
test/datasets/test_graphland.py::test_transductive_graphland[avazu-ctr] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[hm-categories] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[tolokers-2] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[avazu-ctr] PASSED

================================================================ warnings summary ================================================================
test/datasets/test_graphland.py::test_inductive_graphland[hm-categories]
  .../lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:246: UserWarning: Found unknown categories in columns [3] during transform. These unknown categories will be encoded as all zeros
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================== 6 passed, 1 warning in 92.21s (0:01:32) =====================================================

Oct 07 '25 11:10 gvbazhenov

note: we want to make the CI weekly instead of after every commit, i will look into this while im back unless @akihironitta or @gvbazhenov can set this up

Oct 10 '25 18:10 puririshi98

@akihironitta Thanks for your comments! I have solved those problems with copying objects.

Oct 13 '25 15:10 gvbazhenov

@akihironitta @puririshi98 Excuse me, just wanted to know if anything else is required from my side in order to get this PR merged. Thanks for your help!

Nov 25 '25 15:11 gvbazhenov

Hi everyone! I have pushed an update that adds the default preprocessing for the introduced datasets. It looks like the CI is failing, but it seems to be an unrelated issue with the nightly test. The rest of the checks are passing. Since the new changes are minor, I suppose that the PR is ready to be merged. Thanks!

Dec 02 '25 12:12 gvbazhenov

Hi! One of the authors of the GraphLand paper here. Our benchmark has attracted quite a bit of attention at the recent NeurIPS and LoG conferences, and it seems like there are a lot of people wanting to experiment with it. Thus, it would be great if you could merge this pull request. PyG is the most popular library for graph ML and a lot of people rely on it for access to standard datasets, so it would be very convenient if it also includes GraphLand. Thanks in advance! @rusty1s @akihironitta @puririshi98 @wsad1

Dec 15 '25 14:12 OlegPlatonov