lingvo
lingvo copied to clipboard
About the transmission between ps and trainer.
When I used asynchronous training, I found that Gigabit broadband is not enough to support the transmission between ps and trainer. How to solve it?
You can try to switch to bfloat16
. That's half the size. During training, you may want to keep training internally in float32
for more stability.