graph-learn servers would be hang when change inter_thread

servers would be hang when change inter_thread_num

Open liaocz opened this issue 5 years ago • 2 comments

Hi when i test performance using graph-learn framework and set inter_thread_num equal to 64 or greater using gl.set_inter_threadnum(64), all the servers are hang during initializing graph data and workers are waiting for servers ready

Jun 04 '20 01:06 liaocz

We did not re-produce your problem. Would you mind to supply more details, about the size of data, the cluster info and so on.

Besides, the performance will not always be improved as the thread number grows. 16 and 32 are usually practical.

Jun 09 '20 02:06 jackonan

@jackonan I have repeated the test for these problem, the procedure is as follows: eg: I use dist_train.py (examples/tf/graphsage/dist_train.py) to test distributed mode and using 2 ps and 2 workers.

set gl.set_inter_threadnum(32) in dist_train.py (32 also is default value set in source code), training is on the rails
set gl.set_inter_threadnum(64) in dist_train.py , server are hang during data Initialization.

Jun 11 '20 05:06 liaocz

graph-learn graph-learn copied to clipboard

servers would be hang when change inter_thread_num

graph-learn
graph-learn copied to clipboard