graph-learn icon indicating copy to clipboard operation
graph-learn copied to clipboard

servers would be hang when change inter_thread_num

Open liaocz opened this issue 5 years ago • 2 comments

Hi when i test performance using graph-learn framework and set inter_thread_num equal to 64 or greater using gl.set_inter_threadnum(64), all the servers are hang during initializing graph data and workers are waiting for servers ready

liaocz avatar Jun 04 '20 01:06 liaocz

We did not re-produce your problem. Would you mind to supply more details, about the size of data, the cluster info and so on.

Besides, the performance will not always be improved as the thread number grows. 16 and 32 are usually practical.

jackonan avatar Jun 09 '20 02:06 jackonan

@jackonan I have repeated the test for these problem, the procedure is as follows: eg: I use dist_train.py (examples/tf/graphsage/dist_train.py) to test distributed mode and using 2 ps and 2 workers.

  • set gl.set_inter_threadnum(32) in dist_train.py (32 also is default value set in source code), training is on the rails
  • set gl.set_inter_threadnum(64) in dist_train.py , server are hang during data Initialization.

liaocz avatar Jun 11 '20 05:06 liaocz