lingvo icon indicating copy to clipboard operation
lingvo copied to clipboard

About the transmission between ps and trainer.

Open shengzhang0222 opened this issue 5 years ago • 1 comments

When I used asynchronous training, I found that Gigabit broadband is not enough to support the transmission between ps and trainer. How to solve it?

shengzhang0222 avatar Aug 21 '19 07:08 shengzhang0222

You can try to switch to bfloat16. That's half the size. During training, you may want to keep training internally in float32 for more stability.

drpngx avatar Oct 05 '19 10:10 drpngx