distributed-llama
distributed-llama copied to clipboard
network utilization
Let's calculate the transfer time theoretically.
llama3 8B
The original experiment data is here.
Since the transfer is full-duplex, there's no interference between uplink and downlink.
So, we can choose the bigger 510 kB
as the transfer data volume to calculate the transfer time.
510000*8 bit/1G bps = 4.08ms
4.08ms/199.60 ms ~= 2\%
So, the average transfer time should be 4.08ms. However, your result is 199.60 ms
, 50 times higher.
So, the network utilization ratio is merely 2%.
llama2 7B
For comparison, I summarize a similar model (llama2 7B) using different devices:
VMs
In this discussion, the Network Bandwidth is 20 Gbps
, reference here.
590000*8 bit/20G bps = 0.236ms
0.236ms/7.62ms= 3\%
So, the network utilization ratio is merely 3%. Similarly, we can calculate the result of 4 VMs to be 6%.
RaspberryPi
Also, the result of the Raspberry Pi cluster is calculated to be 9.0%, 48.0%, 14.1% for 2,4,8 devices.
- llama2 13B 23.9%,25.75%, 9.8%
- llama2 70B 8.5%
Summary
I think the network utilization, average around 11%
, ranging from 2%
to 48%
, is under-optimized.
Developing the code possibly ensures a stable and high network utilization.
Originally posted by @zhengpeirong in https://github.com/b4rtaz/distributed-llama/discussions/41#discussioncomment-9480575