lingvo
lingvo copied to clipboard
About the transmission between ps and trainer.
When I used asynchronous training, I found that Gigabit broadband is not enough to support the transmission between ps and trainer. How to solve it?
You can try to switch to bfloat16. That's half the size. During training, you may want to keep training internally in float32 for more stability.