v6d
                                
                                 v6d copied to clipboard
                                
                                    v6d copied to clipboard
                            
                            
                            
                        Optimize the speed of concurrent get of pytorch models
Describe your problem
Currently, getting a pytorch module at high concurrency is very slow as follows. The test machines's max bandwidth are both 30Gbps.
Vineyard
| Concurrencies | Time of getting | Observed Network Bandwith from Dstat | 
|---|---|---|
| 1 | 2.57s | around 2000Mi | 
| 6 | 7.73s | around 3800Mi | 
| 13 | 14.58s | around 3800Mi | 
| 27 | 29.32s | around 3800Mi | 
Iperf
| Concurrencies | Observed Network Bandwith from Dstat | Total Network bandwidth | 
|---|---|---|
| 1 | around 1470Mi | 12Gbits/s (1500Mib/s) | 
| 6 | around 3700Mi | 31.1Gbit/s (3888Mib/s) | 
| 13 | around 3650Mi | 30.9Gbit/s (3863Mib/s) | 
| 27 | around 3650Mi | 30.9Gbit/s (3863Mib/s) | 
Solution
In the actual scenery, the pytorch models used to be loaded in the machine with GPU, which always have high- performance networks. Thus, the bandwidth of vineyardd instance is the bottleneck. We can distribute the PyTorch model blobs among different Vineyard instances to increase network bandwidth.