Ability to reuse PYG graph loaded from database to create models

Open ArthurKeen opened this issue 3 years ago • 1 comments

The model definition/instantiation process from graph algorithm will download data from the source database every time you instantiate a model. This causes an unnecessary time delay when you are testing model variations, i.e., variations of the hyper-parameters. for example, this commonly used fragment of code will reload the data from the database if you were to run it multiple times:

model = SAGE(db, arango_graph, metagraph, embedding_size=64) # define graph embedding model model._train(model, epochs=10) # train

Imagine you wanted to test whether increasing embedding sizes improves model performance (we could use hyper-parameter optimization). You would want it to import the data once from the data source and then keep re-using the local graph object

for i in range(0,5): model[i] = SAGE(db, arango_graph, metagraph, embedding_size=pow(2, i+5), reuse_data=true)) model[i]._train(model[i], epochs=10) # train ...

Sep 30 '22 18:09 ArthurKeen

depends on PyG adapter so will be added in future iterations

Nov 07 '22 15:11 sachinsharma9780