distributed-llama
distributed-llama copied to clipboard
add distributed llama on docker container test
# 1 worker + inference
make docker-1-worker-inference
# 3 workers + inference like this:
make docker-3-worker-inference WORKERS="172.18.0.2:9997 172.18.0.3:9997 172.18.0.4:9997"
my local test on docker containers: (use default checkpoint: stories42M.bin)
- 1 worker (1 thread) + inference (1 thread)
💡 dim: 512
💡 hiddenDim: 1376
💡 nLayers: 8
💡 nHeads: 8
💡 nKvHeads: 8
💡 vocabSize: 32000
💡 seqLen: 1024
💡 nSlices: 2
⏩ Loaded 232556544 bytes
🔶 G 38 ms I 38 ms T 0 ms S 49477 kB R 61 kB Hello
🔶 G 42 ms I 39 ms T 2 ms S 69 kB R 61 kB was
🔶 G 44 ms I 42 ms T 1 ms S 69 kB R 61 kB in
🔶 G 44 ms I 39 ms T 5 ms S 69 kB R 61 kB the
🔶 G 42 ms I 42 ms T 0 ms S 69 kB R 61 kB park
🔶 G 47 ms I 45 ms T 2 ms S 69 kB R 61 kB .
🔶 G 44 ms I 41 ms T 2 ms S 69 kB R 61 kB It
🔶 G 43 ms I 40 ms T 3 ms S 69 kB R 61 kB was
🔶 G 42 ms I 39 ms T 3 ms S 69 kB R 61 kB a
🔶 G 40 ms I 39 ms T 1 ms S 69 kB R 61 kB beautiful
🔶 G 42 ms I 38 ms T 4 ms S 69 kB R 61 kB day
🔶 G 43 ms I 40 ms T 2 ms S 69 kB R 61 kB ,
🔶 G 43 ms I 39 ms T 3 ms S 69 kB R 61 kB and
🔶 G 41 ms I 39 ms T 1 ms S 69 kB R 61 kB the
🔶 G 47 ms I 40 ms T 6 ms S 69 kB R 61 kB sun
🔶 G 45 ms I 41 ms T 4 ms S 69 kB R 61 kB was
Generated tokens: 16
Avg generation time: 42.94 ms
Avg inference time: 40.06 ms
Avg transfer time: 2.44 ms
- 3 worker (1 thread) + inference (1 thread)
💡 dim: 512
💡 hiddenDim: 1376
💡 nLayers: 8
💡 nHeads: 8
💡 nKvHeads: 8
💡 vocabSize: 32000
💡 seqLen: 1024
💡 nSlices: 4
⏩ Loaded 232556544 bytes
🔶 G 41 ms I 34 ms T 7 ms S 74352 kB R 92 kB Hello
🔶 G 48 ms I 42 ms T 5 ms S 240 kB R 92 kB was
🔶 G 65 ms I 45 ms T 18 ms S 240 kB R 92 kB in
🔶 G 45 ms I 34 ms T 10 ms S 240 kB R 92 kB the
🔶 G 35 ms I 33 ms T 2 ms S 240 kB R 92 kB park
🔶 G 38 ms I 34 ms T 3 ms S 240 kB R 92 kB .
🔶 G 43 ms I 35 ms T 8 ms S 240 kB R 92 kB It
🔶 G 47 ms I 38 ms T 8 ms S 240 kB R 92 kB was
🔶 G 41 ms I 34 ms T 7 ms S 240 kB R 92 kB a
🔶 G 45 ms I 38 ms T 6 ms S 240 kB R 92 kB beautiful
🔶 G 37 ms I 35 ms T 2 ms S 240 kB R 92 kB day
🔶 G 36 ms I 33 ms T 3 ms S 240 kB R 92 kB .
🔶 G 40 ms I 35 ms T 5 ms S 240 kB R 92 kB There
🔶 G 40 ms I 35 ms T 5 ms S 240 kB R 92 kB was
🔶 G 36 ms I 33 ms T 2 ms S 240 kB R 92 kB a
🔶 G 41 ms I 33 ms T 8 ms S 240 kB R 92 kB bird
Generated tokens: 16
Avg generation time: 42.38 ms
Avg inference time: 35.69 ms
Avg transfer time: 6.19 ms