pytorch_geometric
                                
                                 pytorch_geometric copied to clipboard
                                
                                    pytorch_geometric copied to clipboard
                            
                            
                            
                        [Roadmap] GraphGym via PyTorch Lightning and Hydra 🚀
🚀 The feature, motivation and pitch
The overall goal of this roadmap is to ensure a tighter connection between PyG core and the GraphGym configuration manager. Furthermore, an additional goal is to not re-invent the wheel in GraphGym and make use of popular open-source frameworks whenever applicable, e.g., for configuration managament, training, logging, and autoML.
As such, this roadmap structures itself into different components such as general improvements (e.g., tighter connection between PyG and GraphGym), PyTorch Lightning integration, and Hydra integration as our configuration tool.
General Roadmap
- [ ] Add registerfunctionality to models in PyG core
- [ ] Remove any layer/model definition of GraphGym and move it to PyG core
- [ ] Expose a graphgymbash script in abin/folder - GraphGym usage should not require manually cloning of PyG
- [ ] Better and more user-friendly documentation
- [ ] Adding HeteroDatasupport
- [ ] Adding pooling layers
- [ ] ...
PyTorch Lightning Integration
GraphGym training experience can be improved for scalability, mixed precision support, logging and checkpoints with PyTorch Lightning integration.
- [x] Integrate a LightningModuleinto GraphGym
- [x] Update train method with PL Trainerand theLightningModuleimplementations
- [ ] Refactor load_ckptandsave_ckptwith PL checkpoint save and load method
- [ ] Integrate LightningDataset,LightningNodeDataandLightningLinkDatamodules
- [ ] ...
Hydra Integration
Users of PyG should be able to write GraphGym configurations by being able to make full use of PyG functionality. In particular, we want to allow access to any dataset, any data transformation pipeline, and any GNN layer/model. For this, we need to follow a structured/composable configuration, e.g., as introduced here
defaults:
  - dataset: KarateClub
  - [email protected]:
      - NormalizeFeatures
      - AddSelfLoops
  - model: GCN
  - optimizer: Adam
  - lr_scheduler: ReduceLROnPlateau
  - _self_
model:
  in_channels: 34
  out_channels: 4
  hidden_channels: 16
  num_layers: 2
- [ ] Use variable interpolation, e.g., model.in_channels = ${dataset.num_features}andmodel.out_channels = ${dataset.num_classes}
- [ ] ...
Weights & Biases Integration (TBD)
- [ ] ...
AutoML (TBD)
- [ ] ...
cc @pyg-team/biotax-team
Integrate
LightningDataset,LightningNodeDataandLightningLinkDatamodules
New here: what do LightningNodeData and LightningLinkData refer to?
Refactor
load_ckptandsave_ckptwith PL checkpoint save and load method
Is this still needed after #4689?
@julian-q Welcome :) LightningNodeDataset, LightningNodeData and LightningLinkData refer to our helper data modules to connect PyG with PL, see here. Currently, they are not used within GraphGym.
Is this still needed after https://github.com/pyg-team/pytorch_geometric/pull/4689?
I assume so. load_ckpt and save_ckpt doesn't look like they currently make use of PL checkpoints.
I would like to contribute to this task. I have previously worked on using pytorch lightning and hydra together in this repo.
This is amazing. We should collect some information about how we want to integrate Hydra into GraphGym, as I believe we need a new config layout. I have started something a long time ago but did not finish it, see here, here and here. Would very much appreciate some advice and insights from you!
I'll spend sometime going through the links you shared and start a draft PR regarding this. Hope to get your guidance on it as well :).
@rusty1s I would like to try this!
I would like to try this!
Cool :) We were sadly a bit lazy in the further development of GraphGym, so happy to see some activity back on this :)
I would like to try this!
Cool :) We were sadly a bit lazy in the further development of GraphGym, so happy to see some activity back on this :)
Okay Would Work on this from Monday! I know how to code it.. would you just tell me where I can Exactly Put the code? locations of the file. which files to edit?
@rusty1s is it still open? can i contribute?
This roadmap is in a fuzzy state right now, there exists a few PRs already like https://github.com/pyg-team/pytorch_geometric/pull/5626 but I haven't really have time to merge this yet.