distributed-llama icon indicating copy to clipboard operation
distributed-llama copied to clipboard

network utilization

Open zhengpeirong opened this issue 9 months ago • 3 comments

Let's calculate the transfer time theoretically.

llama3 8B

The original experiment data is here. Since the transfer is full-duplex, there's no interference between uplink and downlink. So, we can choose the bigger 510 kB as the transfer data volume to calculate the transfer time.

510000*8 bit/1G bps = 4.08ms 
4.08ms/199.60 ms ~= 2\%

So, the average transfer time should be 4.08ms. However, your result is 199.60 ms, 50 times higher. So, the network utilization ratio is merely 2%.

llama2 7B

For comparison, I summarize a similar model (llama2 7B) using different devices: image

VMs

In this discussion, the Network Bandwidth is 20 Gbps, reference here. image

590000*8 bit/20G bps = 0.236ms
0.236ms/7.62ms= 3\%

So, the network utilization ratio is merely 3%. Similarly, we can calculate the result of 4 VMs to be 6%.

RaspberryPi

image

Also, the result of the Raspberry Pi cluster is calculated to be 9.0%, 48.0%, 14.1% for 2,4,8 devices.

  • llama2 13B 23.9%,25.75%, 9.8%
  • llama2 70B 8.5%

Summary

I think the network utilization, average around 11%, ranging from 2% to 48%, is under-optimized. Developing the code possibly ensures a stable and high network utilization.

Originally posted by @zhengpeirong in https://github.com/b4rtaz/distributed-llama/discussions/41#discussioncomment-9480575

zhengpeirong avatar May 18 '24 16:05 zhengpeirong