pytorch_geometric icon indicating copy to clipboard operation
pytorch_geometric copied to clipboard

[Roadmap] GraphGym via PyTorch Lightning and Hydra 🚀

Open rusty1s opened this issue 3 years ago • 2 comments

🚀 The feature, motivation and pitch

The overall goal of this roadmap is to ensure a tighter connection between PyG core and the GraphGym configuration manager. Furthermore, an additional goal is to not re-invent the wheel in GraphGym and make use of popular open-source frameworks whenever applicable, e.g., for configuration managament, training, logging, and autoML.

As such, this roadmap structures itself into different components such as general improvements (e.g., tighter connection between PyG and GraphGym), PyTorch Lightning integration, and Hydra integration as our configuration tool.

General Roadmap

  • [ ] Add register functionality to models in PyG core
  • [ ] Remove any layer/model definition of GraphGym and move it to PyG core
  • [ ] Expose a graphgym bash script in a bin/ folder - GraphGym usage should not require manually cloning of PyG
  • [ ] Better and more user-friendly documentation
  • [ ] Adding HeteroData support
  • [ ] Adding pooling layers
  • [ ] ...

PyTorch Lightning Integration

GraphGym training experience can be improved for scalability, mixed precision support, logging and checkpoints with PyTorch Lightning integration.

  • [x] Integrate a LightningModule into GraphGym
  • [x] Update train method with PL Trainer and the LightningModule implementations
  • [ ] Refactor load_ckpt and save_ckpt with PL checkpoint save and load method
  • [ ] Integrate LightningDataset, LightningNodeData and LightningLinkData modules
  • [ ] ...

Hydra Integration

Users of PyG should be able to write GraphGym configurations by being able to make full use of PyG functionality. In particular, we want to allow access to any dataset, any data transformation pipeline, and any GNN layer/model. For this, we need to follow a structured/composable configuration, e.g., as introduced here

defaults:
  - dataset: KarateClub
  - [email protected]:
      - NormalizeFeatures
      - AddSelfLoops
  - model: GCN
  - optimizer: Adam
  - lr_scheduler: ReduceLROnPlateau
  - _self_

model:
  in_channels: 34
  out_channels: 4
  hidden_channels: 16
  num_layers: 2
  • [ ] Use variable interpolation, e.g., model.in_channels = ${dataset.num_features} and model.out_channels = ${dataset.num_classes}
  • [ ] ...

Weights & Biases Integration (TBD)

  • [ ] ...

AutoML (TBD)

  • [ ] ...

cc @pyg-team/biotax-team

rusty1s avatar Aug 04 '22 07:08 rusty1s

Integrate LightningDataset, LightningNodeData and LightningLinkData modules

New here: what do LightningNodeData and LightningLinkData refer to?

Refactor load_ckpt and save_ckpt with PL checkpoint save and load method

Is this still needed after #4689?

julian-q avatar Sep 11 '22 17:09 julian-q

@julian-q Welcome :) LightningNodeDataset, LightningNodeData and LightningLinkData refer to our helper data modules to connect PyG with PL, see here. Currently, they are not used within GraphGym.

Is this still needed after https://github.com/pyg-team/pytorch_geometric/pull/4689?

I assume so. load_ckpt and save_ckpt doesn't look like they currently make use of PL checkpoints.

rusty1s avatar Sep 15 '22 11:09 rusty1s

I would like to contribute to this task. I have previously worked on using pytorch lightning and hydra together in this repo.

shenoynikhil avatar Oct 24 '22 21:10 shenoynikhil

This is amazing. We should collect some information about how we want to integrate Hydra into GraphGym, as I believe we need a new config layout. I have started something a long time ago but did not finish it, see here, here and here. Would very much appreciate some advice and insights from you!

rusty1s avatar Oct 25 '22 06:10 rusty1s

I'll spend sometime going through the links you shared and start a draft PR regarding this. Hope to get your guidance on it as well :).

shenoynikhil avatar Oct 25 '22 06:10 shenoynikhil

@rusty1s I would like to try this!

rajveer43 avatar Aug 09 '23 09:08 rajveer43

I would like to try this!

Cool :) We were sadly a bit lazy in the further development of GraphGym, so happy to see some activity back on this :)

rusty1s avatar Aug 11 '23 14:08 rusty1s

I would like to try this!

Cool :) We were sadly a bit lazy in the further development of GraphGym, so happy to see some activity back on this :)

Okay Would Work on this from Monday! I know how to code it.. would you just tell me where I can Exactly Put the code? locations of the file. which files to edit?

rajveer43 avatar Aug 11 '23 15:08 rajveer43

@rusty1s is it still open? can i contribute?

RagnarokAnsh avatar Sep 19 '23 16:09 RagnarokAnsh

This roadmap is in a fuzzy state right now, there exists a few PRs already like https://github.com/pyg-team/pytorch_geometric/pull/5626 but I haven't really have time to merge this yet.

rusty1s avatar Sep 21 '23 13:09 rusty1s