v6d
v6d copied to clipboard
Optimize the speed of concurrent get of pytorch models
Describe your problem
Currently, getting a pytorch module at high concurrency is very slow as follows. The test machines's max bandwidth are both 30Gbps.
Vineyard
| Concurrencies | Time of getting | Observed Network Bandwith from Dstat |
|---|---|---|
| 1 | 2.57s | around 2000Mi |
| 6 | 7.73s | around 3800Mi |
| 13 | 14.58s | around 3800Mi |
| 27 | 29.32s | around 3800Mi |
Iperf
| Concurrencies | Observed Network Bandwith from Dstat | Total Network bandwidth |
|---|---|---|
| 1 | around 1470Mi | 12Gbits/s (1500Mib/s) |
| 6 | around 3700Mi | 31.1Gbit/s (3888Mib/s) |
| 13 | around 3650Mi | 30.9Gbit/s (3863Mib/s) |
| 27 | around 3650Mi | 30.9Gbit/s (3863Mib/s) |
Solution
In the actual scenery, the pytorch models used to be loaded in the machine with GPU, which always have high- performance networks. Thus, the bandwidth of vineyardd instance is the bottleneck. We can distribute the PyTorch model blobs among different Vineyard instances to increase network bandwidth.