graph-learn Do Graph-Learn support a distributed graph stored on distributed filesystem?

Do Graph-Learn support a distributed graph stored on distributed filesystem?

Open Zarca opened this issue 4 years ago • 2 comments

Hello, The distributed training example of GraphSage in: https://github.com/alibaba/graph-learn/blob/master/examples/tf/graphsage/dist_train.py seems to load the full graph (cora) in all machine nodes separately, and begin training using a tensorflow's ps and worker frame. Can the graph be stored from hdfs or some distributed file system in the beginning ,while any one of the worker processes just need to load a partition of it? And what is the purpose of the tracker (default is /mnt/data/nfs/graph-learn/distributed/')?

Mar 02 '21 04:03 Zarca

Yes, this also confused me a lot

Apr 01 '21 05:04 eedalong

@Zarca @eedalong To load from HDFS or other distributed file systems, we need to implement the corresponding interfaces. Currently, only local file system is supported. Exactly to what you said, all data will be loaded into memory distributedly and then trained in worker-ps mode based on TF.

To read data file, we use a SEEK interface to partition the whole data, which was not friendly in some cases. And we will change the behavior and make a server just load ONE partition without SEEK. It will be released soon.

For tracker, it is used to sync address and other states. We may not be able to assign the ip:port for servers when launching, and self-discover is necessary in such cases.

Apr 02 '21 03:04 jackonan

graph-learn graph-learn copied to clipboard

Do Graph-Learn support a distributed graph stored on distributed filesystem?

graph-learn
graph-learn copied to clipboard