graph-learn
graph-learn copied to clipboard
servers would be hang when change inter_thread_num
Hi
when i test performance using graph-learn framework and set inter_thread_num equal to 64 or greater using gl.set_inter_threadnum(64), all the servers are hang during initializing graph data and workers are waiting for servers ready
We did not re-produce your problem. Would you mind to supply more details, about the size of data, the cluster info and so on.
Besides, the performance will not always be improved as the thread number grows. 16 and 32 are usually practical.
@jackonan I have repeated the test for these problem, the procedure is as follows: eg: I use dist_train.py (examples/tf/graphsage/dist_train.py) to test distributed mode and using 2 ps and 2 workers.
- set gl.set_inter_threadnum(32) in dist_train.py (32 also is default value set in source code), training is on the rails
- set gl.set_inter_threadnum(64) in dist_train.py , server are hang during data Initialization.