OOM during model loading on first node in cluster with Mali GPU
I have a system with 9 identical nodes. However, the model randomly load on one node and Out-of-Memory (OOM) on the node during loading. My device uses a Mali GPU with less than 3GB of memory.
I would like to understand if it is possible to configure the distribution of layers across the nodes to optimize the memory usage and prevent the OOM issues. Currently, regardless of the number of nodes in the cluster, the reported total TFLOPS is 0, which seems incorrect.
Could you please provide guidance on the following:
How can I configure the layer distribution across the nodes to better utilize the available memory and avoid the OOM errors? Why is the total TFLOPS reported as 0, even though I have multiple nodes? Is this a known issue with Mali GPUs or my configuration? Any insights or suggestions to improve the performance and stability of my multi-node setup would be greatly appreciated.
Topology collection task executed.
Current topology: Topology(Nodes: {f6a3c47c-0abf-42b6-a3ef-41c61f326f95: Model:
Linux Box (Device: GPU). Chip: Unknown Chip (Device: GPU). Memory: 3718MB.
Flops: fp32: 0.00 TFLOPS, fp16: 0.00 TFLOPS, int8: 0.00 TFLOPS,
af84f14c-18ff-4510-937b-c9adbebf2025: Model: Linux Box (Device: GPU). Chip:
Unknown Chip (Device: GPU). Memory: 3718MB. Flops: fp32: 0.00 TFLOPS, fp16: 0.00
TFLOPS, int8: 0.00 TFLOPS, 572fe1b3-32ef-4189-bf21-9c668b340c5a: Model: Linux
Box (Device: GPU). Chip: Unknown Chip (Device: GPU). Memory: 3718MB. Flops:
fp32: 0.00 TFLOPS, fp16: 0.00 TFLOPS, int8: 0.00 TFLOPS,
97ce079c-65eb-49af-9afc-5d1dfdda283b: Model: Linux Box (Device: GPU). Chip:
Unknown Chip (Device: GPU). Memory: 3718MB. Flops: fp32: 0.00 TFLOPS, fp16: 0.00
TFLOPS, int8: 0.00 TFLOPS, 1532d6ac-75de-4a97-9061-bfa49159d93b: Model: Linux
Box (Device: GPU). Chip: Unknown Chip (Device: GPU). Memory: 3718MB. Flops:
fp32: 0.00 TFLOPS, fp16: 0.00 TFLOPS, int8: 0.00 TFLOPS,
975e879a-1838-4c17-ba30-5a3e2d23e7d6: Model: Linux Box (Device: GPU). Chip:
Unknown Chip (Device: GPU). Memory: 3718MB. Flops: fp32: 0.00 TFLOPS, fp16: 0.00
TFLOPS, int8: 0.00 TFLOPS, 10b6b9a8-c2c2-44f3-9467-be79be5bfce6: Model: Linux
Box (Device: GPU). Chip: Unknown Chip (Device: GPU). Memory: 3718MB. Flops:
fp32: 0.00 TFLOPS, fp16: 0.00 TFLOPS, int8: 0.00 TFLOPS,
2297fcb3-92c0-4c1d-8212-ca9b83fa4752: Model: Linux Box (Device: GPU). Chip:
Unknown Chip (Device: GPU). Memory: 3718MB. Flops: fp32: 0.00 TFLOPS, fp16: 0.00
TFLOPS, int8: 0.00 TFLOPS, ac8eb7e2-ea9f-4717-a9ed-ccfac4e8f574: Model: Linux
Box (Device: GPU). Chip: Unknown Chip (Device: GPU). Memory: 3718MB. Flops:
fp32: 0.00 TFLOPS, fp16: 0.00 TFLOPS, int8: 0.00 TFLOPS}, Edges:
{f6a3c47c-0abf-42b6-a3ef-41c61f326f95: {'572fe1b3-32ef-4189-bf21-9c668b340c5a',
'975e879a-1838-4c17-ba30-5a3e2d23e7d6', '1532d6ac-75de-4a97-9061-bfa49159d93b',
'af84f14c-18ff-4510-937b-c9adbebf2025', 'ac8eb7e2-ea9f-4717-a9ed-ccfac4e8f574',
'97ce079c-65eb-49af-9afc-5d1dfdda283b', '10b6b9a8-c2c2-44f3-9467-be79be5bfce6',
'2297fcb3-92c0-4c1d-8212-ca9b83fa4752'}, af84f14c-18ff-4510-937b-c9adbebf2025:
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 572fe1b3-32ef-4189-bf21-9c668b340c5a:
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 97ce079c-65eb-49af-9afc-5d1dfdda283b:
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 1532d6ac-75de-4a97-9061-bfa49159d93b:
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 975e879a-1838-4c17-ba30-5a3e2d23e7d6:
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 10b6b9a8-c2c2-44f3-9467-be79be5bfce6:
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 2297fcb3-92c0-4c1d-8212-ca9b83fa4752:
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, ac8eb7e2-ea9f-4717-a9ed-ccfac4e8f574:
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}})
Received SendOpaqueStatus request:
request_id='5930d8a3-36b1-4b78-a91d-3510331dcd3c' status='{"type":
"node_status", "node_id": "2297fcb3-92c0-4c1d-8212-ca9b83fa4752", "status":
"start_process_prompt", "base_shard": {"model_id": "llama3-8b-sfr",
"start_layer": 0, "end_layer": 0, "n_layers": 32}, "shard": {"model_id":
"llama3-8b-sfr", "start_layer": 21, "end_layer": 23, "n_layers": 32}, "prompt":
"<|im_start|>user\\nWhat is the meaning of
exo?<|im_end|>\\n<|im_start|>assistant\\n", "inference_state": null,
"request_id": "5930d8a3-36b1-4b78-a91d-3510331dcd3c"}'
Received SendOpaqueStatus request:
request_id='5930d8a3-36b1-4b78-a91d-3510331dcd3c' status='{"type":
"node_status", "node_id": "1532d6ac-75de-4a97-9061-bfa49159d93b", "status":
"start_process_prompt", "base_shard": {"model_id": "llama3-8b-sfr",
"start_layer": 24, "end_layer": 27, "n_layers": 32}, "shard": {"model_id":
"llama3-8b-sfr", "start_layer": 24, "end_layer": 27, "n_layers": 32}, "prompt":
"<|im_start|>user\\nWhat is the meaning of
exo?<|im_end|>\\n<|im_start|>assistant\\n", "inference_state": null,
"request_id": "5930d8a3-36b1-4b78-a91d-3510331dcd3c"}'
[5930d8a3-36b1-4b78-a91d-3510331dcd3c] process prompt:
base_shard=Shard(model_id='llama3-8b-sfr', start_layer=0, end_layer=2,
n_layers=32) shard=Shard(model_id='llama3-8b-sfr', start_layer=0, end_layer=2,
n_layers=32) prompt='<|im_start|>user\nWhat is the meaning of
exo?<|im_end|>\n<|im_start|>assistant\n'
opened device NPY from pid:394787
opened device
DISK:/nasroot/models/Meta-Llama-3-8B/model-00001-of-00004.safetensors from
pid:394787
*** DISK:/n 1 empty 4976698672 dtypes.uchar arg 1 mem 0.00 GB
*** DISK:/n 2 view 8 @ 0 arg 2 mem 0.00 GB
opened device CLANG from pid:394787
*** CLANG 3 copy 8, CLANG <- DISK:/n arg 2 mem 0.00 GB tm
19.73ms/ 19.73ms ( 0.00 GFLOPS, 0.00 GB/s)
*** DISK:/n 4 view 9512 @ 8 arg 2 mem 0.00 GB
*** CLANG 5 copy 9512, CLANG <- DISK:/n arg 2 mem 0.00 GB tm
24.79us/ 19.76ms ( 0.00 GFLOPS, 0.38 GB/s)
opened device
DISK:/nasroot/models/Meta-Llama-3-8B/model-00002-of-00004.safetensors from
pid:394787
*** DISK:/n 6 empty 4999802720 dtypes.uchar arg 1 mem 0.00 GB
*** DISK:/n 7 view 8 @ 0 arg 2 mem 0.00 GB
*** CLANG 8 copy 8, CLANG <- DISK:/n arg 2 mem 0.00 GB tm
27.56ms/ 47.32ms ( 0.00 GFLOPS, 0.00 GB/s)
*** DISK:/n 9 view 12120 @ 8 arg 2 mem 0.00 GB
*** CLANG 10 copy 12120, CLANG <- DISK:/n arg 2 mem 0.00 GB tm
16.04us/ 47.34ms ( 0.00 GFLOPS, 0.76 GB/s)
opened device
DISK:/nasroot/models/Meta-Llama-3-8B/model-00003-of-00004.safetensors from
pid:394787
*** DISK:/n 11 empty 4915916176 dtypes.uchar arg 1 mem 0.00 GB
*** DISK:/n 12 view 8 @ 0 arg 2 mem 0.00 GB
*** CLANG 13 copy 8, CLANG <- DISK:/n arg 2 mem 0.00 GB tm
8140.60us/ 55.48ms ( 0.00 GFLOPS, 0.00 GB/s)
*** DISK:/n 14 view 11656 @ 8 arg 2 mem 0.00 GB
*** CLANG 15 copy 11656, CLANG <- DISK:/n arg 2 mem 0.00 GB tm
17.79us/ 55.50ms ( 0.00 GFLOPS, 0.66 GB/s)
opened device
DISK:/nasroot/models/Meta-Llama-3-8B/model-00004-of-00004.safetensors from
pid:394787
*** DISK:/n 16 empty 1168138808 dtypes.uchar arg 1 mem 0.00 GB
*** DISK:/n 17 view 8 @ 0 arg 2 mem 0.00 GB
*** CLANG 18 copy 8, CLANG <- DISK:/n arg 2 mem 0.00 GB tm
13.17ms/ 68.66ms ( 0.00 GFLOPS, 0.00 GB/s)
*** DISK:/n 19 view 560 @ 8 arg 2 mem 0.00 GB
*** CLANG 20 copy 560, CLANG <- DISK:/n arg 2 mem 0.00 GB tm
16.33us/ 68.68ms ( 0.00 GFLOPS, 0.03 GB/s)
0%| | 0/31 [00:00<?, ?it/s]
*** DISK:/n 21 view 33554432 @ 1444963632 arg 2 mem 0.00 GB
*** GPU 22 copy 33.55M, GPU <- DISK:/n arg 2 mem 0.03 GB tm
183.50ms/ 252.18ms ( 0.00 GFLOPS, 0.18 GB/s)
ram used: 0.00 GB, layers.0.attention.wq.weight : 3%| |
*** DISK:/n 23 view 8388608 @ 1403020592 arg 2 mem 0.03 GB
*** GPU 24 copy 8.39M, GPU <- DISK:/n arg 2 mem 0.04 GB tm
54.54ms/ 306.72ms ( 0.00 GFLOPS, 0.15 GB/s)
ram used: 0.03 GB, layers.0.attention.wk.weight : 6%| |
*** DISK:/n 25 view 8388608 @ 1478518064 arg 2 mem 0.04 GB
*** GPU 26 copy 8.39M, GPU <- DISK:/n arg 2 mem 0.05 GB tm
45.24ms/ 351.96ms ( 0.00 GFLOPS, 0.19 GB/s)
ram used: 0.04 GB, layers.0.attention.wv.weight : 10%| |
*** DISK:/n 27 view 33554432 @ 1411409200 arg 2 mem 0.05 GB
*** GPU 28 copy 33.55M, GPU <- DISK:/n arg 2 mem 0.08 GB tm
177.18ms/ 529.15ms ( 0.00 GFLOPS, 0.19 GB/s)
ram used: 0.05 GB, layers.0.attention.wo.weight : 13%|▏|
*** DISK:/n 29 view 117440512 @ 1168131376 arg 2 mem 0.08 GB
*** GPU 30 copy 117.44M, GPU <- DISK:/n arg 2 mem 0.20 GB tm
625.08ms/ 1154.22ms ( 0.00 GFLOPS, 0.19 GB/s)
ram used: 0.08 GB, layers.0.feed_forward.w1.weight : 16%|▏|
*** DISK:/n 31 view 117440512 @ 1050690864 arg 2 mem 0.20 GB
*** GPU 32 copy 117.44M, GPU <- DISK:/n arg 2 mem 0.32 GB tm
604.83ms/ 1759.05ms ( 0.00 GFLOPS, 0.19 GB/s)
ram used: 0.20 GB, layers.0.feed_forward.w2.weight : 19%|▏|
*** DISK:/n 33 view 117440512 @ 1285571888 arg 2 mem 0.32 GB
*** GPU 34 copy 117.44M, GPU <- DISK:/n arg 2 mem 0.44 GB tm
635.25ms/ 2394.31ms ( 0.00 GFLOPS, 0.18 GB/s)
ram used: 0.32 GB, layers.0.feed_forward.w3.weight : 23%|▏|
*** DISK:/n 35 view 8192 @ 1050682672 arg 2 mem 0.44 GB
*** GPU 36 copy 8192, GPU <- DISK:/n arg 2 mem 0.44 GB tm
338.91us/ 2394.65ms ( 0.00 GFLOPS, 0.02 GB/s)
ram used: 0.44 GB, layers.0.attention_norm.weight : 26%|▎|
*** DISK:/n 37 view 8192 @ 1403012400 arg 2 mem 0.44 GB
*** GPU 38 copy 8192, GPU <- DISK:/n arg 2 mem 0.44 GB tm
207.96us/ 2394.86ms ( 0.00 GFLOPS, 0.04 GB/s)
ram used: 0.44 GB, layers.0.ffn_norm.weight : 29%|▎|
*** DISK:/n 39 view 33554432 @ 1881187632 arg 2 mem 0.44 GB
*** GPU 40 copy 33.55M, GPU <- DISK:/n arg 2 mem 0.47 GB tm
209.94ms/ 2604.80ms ( 0.00 GFLOPS, 0.16 GB/s)
ram used: 0.44 GB, layers.1.attention.wq.weight : 32%|▎|
*** DISK:/n 41 view 8388608 @ 1839244592 arg 2 mem 0.47 GB
*** GPU 42 copy 8.39M, GPU <- DISK:/n arg 2 mem 0.48 GB tm
49.55ms/ 2654.34ms ( 0.00 GFLOPS, 0.17 GB/s)
ram used: 0.47 GB, layers.1.attention.wk.weight : 35%|▎|
*** DISK:/n 43 view 8388608 @ 1914742064 arg 2 mem 0.48 GB
*** GPU 44 copy 8.39M, GPU <- DISK:/n arg 2 mem 0.49 GB tm
49.60ms/ 2703.94ms ( 0.00 GFLOPS, 0.17 GB/s)
ram used: 0.48 GB, layers.1.attention.wv.weight : 39%|▍|
*** DISK:/n 45 view 33554432 @ 1847633200 arg 2 mem 0.49 GB
*** GPU 46 copy 33.55M, GPU <- DISK:/n arg 2 mem 0.52 GB tm
168.08ms/ 2872.02ms ( 0.00 GFLOPS, 0.20 GB/s)
ram used: 0.49 GB, layers.1.attention.wo.weight : 42%|▍|
*** DISK:/n 47 view 117440512 @ 1604355376 arg 2 mem 0.52 GB
*** GPU 48 copy 117.44M, GPU <- DISK:/n arg 2 mem 0.64 GB tm
660.89ms/ 3532.91ms ( 0.00 GFLOPS, 0.18 GB/s)
ram used: 0.52 GB, layers.1.feed_forward.w1.weight : 45%|▍|
*** DISK:/n 49 view 117440512 @ 1486914864 arg 2 mem 0.64 GB
*** GPU 50 copy 117.44M, GPU <- DISK:/n arg 2 mem 0.75 GB tm
592.79ms/ 4125.70ms ( 0.00 GFLOPS, 0.20 GB/s)
ram used: 0.64 GB, layers.1.feed_forward.w2.weight : 48%|▍|
*** DISK:/n 51 view 117440512 @ 1721795888 arg 2 mem 0.75 GB
*** GPU 52 copy 117.44M, GPU <- DISK:/n arg 2 mem 0.87 GB tm
596.22ms/ 4721.92ms ( 0.00 GFLOPS, 0.20 GB/s)
ram used: 0.75 GB, layers.1.feed_forward.w3.weight : 52%|▌|
*** DISK:/n 53 view 8192 @ 1486906672 arg 2 mem 0.87 GB
*** GPU 54 copy 8192, GPU <- DISK:/n arg 2 mem 0.87 GB tm
280.87us/ 4722.20ms ( 0.00 GFLOPS, 0.03 GB/s)
ram used: 0.87 GB, layers.1.attention_norm.weight : 55%|▌|
*** DISK:/n 55 view 8192 @ 1839236400 arg 2 mem 0.87 GB
*** GPU 56 copy 8192, GPU <- DISK:/n arg 2 mem 0.87 GB tm
254.33us/ 4722.45ms ( 0.00 GFLOPS, 0.03 GB/s)
ram used: 0.87 GB, layers.1.ffn_norm.weight : 58%|▌|
*** DISK:/n 57 view 33554432 @ 2317411632 arg 2 mem 0.87 GB
*** GPU 58 copy 33.55M, GPU <- DISK:/n arg 2 mem 0.91 GB tm
187.20ms/ 4909.65ms ( 0.00 GFLOPS, 0.18 GB/s)
ram used: 0.87 GB, layers.2.attention.wq.weight : 61%|▌|
*** DISK:/n 59 view 8388608 @ 2275468592 arg 2 mem 0.91 GB
*** GPU 60 copy 8.39M, GPU <- DISK:/n arg 2 mem 0.91 GB tm
47.43ms/ 4957.08ms ( 0.00 GFLOPS, 0.18 GB/s)
ram used: 0.91 GB, layers.2.attention.wk.weight : 65%|▋|
*** DISK:/n 61 view 8388608 @ 2350966064 arg 2 mem 0.91 GB
*** GPU 62 copy 8.39M, GPU <- DISK:/n arg 2 mem 0.92 GB tm
48.27ms/ 5005.35ms ( 0.00 GFLOPS, 0.17 GB/s)
ram used: 0.91 GB, layers.2.attention.wv.weight : 68%|▋|
*** DISK:/n 63 view 33554432 @ 2283857200 arg 2 mem 0.92 GB
*** GPU 64 copy 33.55M, GPU <- DISK:/n arg 2 mem 0.96 GB tm
186.63ms/ 5191.99ms ( 0.00 GFLOPS, 0.18 GB/s)
ram used: 0.92 GB, layers.2.attention.wo.weight : 71%|▋|
*** DISK:/n 65 view 117440512 @ 2040579376 arg 2 mem 0.96 GB
*** GPU 66 copy 117.44M, GPU <- DISK:/n arg 2 mem 1.07 GB tm
594.12ms/ 5786.11ms ( 0.00 GFLOPS, 0.20 GB/s)
ram used: 0.96 GB, layers.2.feed_forward.w1.weight : 74%|▋|
*** DISK:/n 67 view 117440512 @ 1923138864 arg 2 mem 1.07 GB
*** GPU 68 copy 117.44M, GPU <- DISK:/n arg 2 mem 1.19 GB tm
619.61ms/ 6405.72ms ( 0.00 GFLOPS, 0.19 GB/s)
ram used: 1.07 GB, layers.2.feed_forward.w2.weight : 77%|▊|
*** DISK:/n 69 view 117440512 @ 2158019888 arg 2 mem 1.19 GB
*** GPU 70 copy 117.44M, GPU <- DISK:/n arg 2 mem 1.31 GB tm
646.90ms/ 7052.61ms ( 0.00 GFLOPS, 0.18 GB/s)
ram used: 1.19 GB, layers.2.feed_forward.w3.weight : 81%|▊|
*** DISK:/n 71 view 8192 @ 1923130672 arg 2 mem 1.31 GB
*** GPU 72 copy 8192, GPU <- DISK:/n arg 2 mem 1.31 GB tm
12.68ms/ 7065.30ms ( 0.00 GFLOPS, 0.00 GB/s)
ram used: 1.31 GB, layers.2.attention_norm.weight : 84%|▊|
*** DISK:/n 73 view 8192 @ 2275460400 arg 2 mem 1.31 GB
*** GPU 74 copy 8192, GPU <- DISK:/n arg 2 mem 1.31 GB tm
207.37us/ 7065.50ms ( 0.00 GFLOPS, 0.04 GB/s)
ram used: 1.31 GB, layers.2.ffn_norm.weight : 87%|▊|
*** DISK:/n 75 view 8192 @ 1168130616 arg 2 mem 1.31 GB
*** GPU 76 copy 8192, GPU <- DISK:/n arg 2 mem 1.31 GB tm
9635.95us/ 7075.14ms ( 0.00 GFLOPS, 0.00 GB/s)
ram used: 1.31 GB, norm.weight : 90%|▉|
*** DISK:/n 77 view 1050673152 @ 9520 arg 2 mem 1.31 GB
*** GPU 78 copy 1050.67M, GPU <- DISK:/n arg 2 mem 2.36 GB tm
5481.18ms/ 12556.32ms ( 0.00 GFLOPS, 0.19 GB/s)
ram used: 1.31 GB, tok_embeddings.weight : 94%|▉|
*** DISK:/n 79 view 1050673152 @ 568 arg 2 mem 2.36 GB
╭─────────────────────────── Exo Cluster (9 nodes) ────────────────────────────╮
│ │
│ _____ _____ │
│ / _ \ \/ / _ \ │
│ | __/> < (_) | │
│ \___/_/\_\___/ │
│ │
│ │
│ Web Chat URL (tinychat): │
│ http://localhost:8000 │
│ ChatGPT API endpoint: │
│ http://localhost:8000/v1/chat/completions │
│ GPU poor ▼ │
│ GPU rich │
│ [🟥🟥🟥🟥🟥🟥🟥🟥🟧🟧🟧🟧🟧🟧🟧🟨🟨🟨🟨🟨 │
│ 🟨🟨🟨🟩🟩🟩🟩🟩🟩🟩] │
│ 0.00 TFLOPS │
│ ▲ │
│ Linux Box │
│ (Device: GPU) 3GB │
│ 0TFLOPS │
│ Linux Box (Device: GPU) 3GB [0.78-0.89] │
│ 0TFLOPS --------------------🔴---- │
│ [0.67-0.78] --🔵 ---- │
... 已杀死