gnn
gnn copied to clipboard
Recommended way to save graph data on TFRecords
Hey,
I saw that in the examples you saved the data not in TFRecords. I am working on a project where we save the dataset on TFRecords. We have a lot of graphs and we save each graph in 1 TFRecord at the moment. The bottleneck is the time to load the TFRecord, so we are thinking to save multiple graphs in 1 TFRecord. What is the recommended way to do this? In the latter case, the issue is when we load using tf.data.Dataset
, we can't find a way to batch them per graph (since we want to load 1 batch = 1 entire graph otherwise the graphs will be truncated). Do you have any idea? Thanks
The basic advice for TF-GNN is to save the input graphs for training and validation in a TFRecord file, with each record being one GraphTensor serialized as a tf.Example proto. TensorFlow's usual techniques apply to reduce deserialization overhead at loading time: tune prefetch etc. of the tf.data pipeline and deserialize batches of input examples (after shuffling, if any). For more, please see https://github.com/tensorflow/gnn/blob/main/tensorflow_gnn/docs/guide/input_pipeline.md
I suppose this issue has been resolved, so I'm going to close it.