gnn icon indicating copy to clipboard operation
gnn copied to clipboard

Recommended way to save graph data on TFRecords

Open imayachita opened this issue 2 years ago • 1 comments

Hey, I saw that in the examples you saved the data not in TFRecords. I am working on a project where we save the dataset on TFRecords. We have a lot of graphs and we save each graph in 1 TFRecord at the moment. The bottleneck is the time to load the TFRecord, so we are thinking to save multiple graphs in 1 TFRecord. What is the recommended way to do this? In the latter case, the issue is when we load using tf.data.Dataset, we can't find a way to batch them per graph (since we want to load 1 batch = 1 entire graph otherwise the graphs will be truncated). Do you have any idea? Thanks

imayachita avatar Aug 16 '22 12:08 imayachita

The basic advice for TF-GNN is to save the input graphs for training and validation in a TFRecord file, with each record being one GraphTensor serialized as a tf.Example proto. TensorFlow's usual techniques apply to reduce deserialization overhead at loading time: tune prefetch etc. of the tf.data pipeline and deserialize batches of input examples (after shuffling, if any). For more, please see https://github.com/tensorflow/gnn/blob/main/tensorflow_gnn/docs/guide/input_pipeline.md

arnoegw avatar Sep 06 '22 13:09 arnoegw

I suppose this issue has been resolved, so I'm going to close it.

arnoegw avatar Mar 24 '23 09:03 arnoegw