graph-learn
graph-learn copied to clipboard
Do Graph-Learn support a distributed graph stored on distributed filesystem?
Hello, The distributed training example of GraphSage in: https://github.com/alibaba/graph-learn/blob/master/examples/tf/graphsage/dist_train.py seems to load the full graph (cora) in all machine nodes separately, and begin training using a tensorflow's ps and worker frame. Can the graph be stored from hdfs or some distributed file system in the beginning ,while any one of the worker processes just need to load a partition of it? And what is the purpose of the tracker (default is /mnt/data/nfs/graph-learn/distributed/')?
Yes, this also confused me a lot
@Zarca @eedalong To load from HDFS or other distributed file systems, we need to implement the corresponding interfaces. Currently, only local file system is supported. Exactly to what you said, all data will be loaded into memory distributedly and then trained in worker-ps mode based on TF.
To read data file, we use a SEEK interface to partition the whole data, which was not friendly in some cases. And we will change the behavior and make a server just load ONE partition without SEEK. It will be released soon.
For tracker, it is used to sync address and other states. We may not be able to assign the ip:port for servers when launching, and self-discover is necessary in such cases.