Add GraphLand benchmark
GraphLand is a new graph benchmark for node property prediction that covers diverse industrial applications and includes graphs with different sizes, structural characteristics, and feature sets.
Could someone help me to fix the problems with the imports of pandas, sklearn and yaml that are required in the implemented class? I do not understand how to organize them so that the tests pass.
Could someone help me to fix the problems with the imports of
pandas,sklearnandyamlthat are required in the implemented class? I do not understand how to organize them so that the tests pass.
Alright, moving imports into function bodies has solved our problems. However, it seems like yaml is not installed for testing. How we can fix that and use yaml in our implementation?
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 85.09%. Comparing base (c211214) to head (8ef967d).
:warning: Report is 139 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #10458 +/- ##
==========================================
- Coverage 86.11% 85.09% -1.03%
==========================================
Files 496 510 +14
Lines 33655 35964 +2309
==========================================
+ Hits 28981 30602 +1621
- Misses 4674 5362 +688
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
There are also some linter issues with types, but they occur not only in torch_geometric/datasets/graphland.py, as I understand. Can we skip them if all other tests pass?
@rusty1s @akihironitta @wsad1 I wanted to kindly ping regarding this PR and share an update: GraphLand benchmark has been accepted to NeurIPS this year, which I believe highlights its potential value for the community. I would greatly appreciate it if you could find some time to review the changes — I think merging this would be very timely and beneficial for many users. Thanks for your help!
@puririshi98 Thanks for picking up our PR! I have managed to fix the linter issues and also added an example on using GraphLand datasets for node property prediction. Hope this can be merged now.
For those runs, please use the latest nvidia pyg container from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pyg
can you share a log of running the example?
@puririshi98 Here are the commands I have executed staying at the root of pytorch_geometric repository:
> docker run --gpus all -it --network=host --rm --mount type=bind,source=(pwd),target=/workspace nvcr.io/nvidia/pyg:25.09-py3 bash
> pip uninstall torch-geometric
... Successfully uninstalled torch-geometric-2.7.0
> pip install .
... Successfully installed torch-geometric-2.7.0
> cd examples
> python graphland.py --name tolokers-2 --split RL
Extracting datasets/tolokers-2/raw/tolokers-2.zip
Processing...
Done!
100%|████████████████████████████████████████| 100/100 [00:03<00:00, 28.84it/s, loss=0.4327, train=49.89, val=43.11, test=44.79]
Best metrics: train=49.78, val=43.11, test=44.76
> python graphland.py --name avazu-ctr --split THI
Extracting datasets/avazu-ctr/raw/avazu-ctr.zip
Processing...
Done!
100%|████████████████████████████████████████| 100/100 [00:27<00:00, 3.63it/s, loss=0.7874, train=21.32, val=15.32, test=27.06]
Best metrics: train=21.32, val=15.32, test=27.06
> rm -rf datasets
> exit
And the logs of pytest:
> pytest test/datasets/test_graphland.py
============================================================== test session starts ===============================================================
platform linux -- Python 3.10.18, pytest-8.4.2, pluggy-1.6.0 -- .../bin/python3.10
cachedir: .pytest_cache
rootdir: ...
configfile: pyproject.toml
plugins: xdist-3.8.0
collected 6 items
test/datasets/test_graphland.py::test_transductive_graphland[hm-categories] PASSED
test/datasets/test_graphland.py::test_transductive_graphland[tolokers-2] PASSED
test/datasets/test_graphland.py::test_transductive_graphland[avazu-ctr] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[hm-categories] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[tolokers-2] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[avazu-ctr] PASSED
================================================================ warnings summary ================================================================
test/datasets/test_graphland.py::test_inductive_graphland[hm-categories]
.../lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:246: UserWarning: Found unknown categories in columns [3] during transform. These unknown categories will be encoded as all zeros
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================== 6 passed, 1 warning in 92.21s (0:01:32) =====================================================
note: we want to make the CI weekly instead of after every commit, i will look into this while im back unless @akihironitta or @gvbazhenov can set this up
@akihironitta Thanks for your comments! I have solved those problems with copying objects.
@akihironitta @puririshi98 Excuse me, just wanted to know if anything else is required from my side in order to get this PR merged. Thanks for your help!
Hi everyone! I have pushed an update that adds the default preprocessing for the introduced datasets. It looks like the CI is failing, but it seems to be an unrelated issue with the nightly test. The rest of the checks are passing. Since the new changes are minor, I suppose that the PR is ready to be merged. Thanks!
Hi! One of the authors of the GraphLand paper here. Our benchmark has attracted quite a bit of attention at the recent NeurIPS and LoG conferences, and it seems like there are a lot of people wanting to experiment with it. Thus, it would be great if you could merge this pull request. PyG is the most popular library for graph ML and a lot of people rely on it for access to standard datasets, so it would be very convenient if it also includes GraphLand. Thanks in advance! @rusty1s @akihironitta @puririshi98 @wsad1