exo icon indicating copy to clipboard operation
exo copied to clipboard

OOM during model loading on first node in cluster with Mali GPU

Open artistlu opened this issue 1 year ago • 0 comments

I have a system with 9 identical nodes. However, the model randomly load on one node and Out-of-Memory (OOM) on the node during loading. My device uses a Mali GPU with less than 3GB of memory.

I would like to understand if it is possible to configure the distribution of layers across the nodes to optimize the memory usage and prevent the OOM issues. Currently, regardless of the number of nodes in the cluster, the reported total TFLOPS is 0, which seems incorrect.

Could you please provide guidance on the following:

How can I configure the layer distribution across the nodes to better utilize the available memory and avoid the OOM errors? Why is the total TFLOPS reported as 0, even though I have multiple nodes? Is this a known issue with Mali GPUs or my configuration? Any insights or suggestions to improve the performance and stability of my multi-node setup would be greatly appreciated.

Topology collection task executed.
Current topology: Topology(Nodes: {f6a3c47c-0abf-42b6-a3ef-41c61f326f95: Model: 
Linux Box (Device: GPU). Chip: Unknown Chip (Device: GPU). Memory: 3718MB. 
Flops: fp32: 0.00 TFLOPS, fp16: 0.00 TFLOPS, int8: 0.00 TFLOPS, 
af84f14c-18ff-4510-937b-c9adbebf2025: Model: Linux Box (Device: GPU). Chip: 
Unknown Chip (Device: GPU). Memory: 3718MB. Flops: fp32: 0.00 TFLOPS, fp16: 0.00
TFLOPS, int8: 0.00 TFLOPS, 572fe1b3-32ef-4189-bf21-9c668b340c5a: Model: Linux 
Box (Device: GPU). Chip: Unknown Chip (Device: GPU). Memory: 3718MB. Flops: 
fp32: 0.00 TFLOPS, fp16: 0.00 TFLOPS, int8: 0.00 TFLOPS, 
97ce079c-65eb-49af-9afc-5d1dfdda283b: Model: Linux Box (Device: GPU). Chip: 
Unknown Chip (Device: GPU). Memory: 3718MB. Flops: fp32: 0.00 TFLOPS, fp16: 0.00
TFLOPS, int8: 0.00 TFLOPS, 1532d6ac-75de-4a97-9061-bfa49159d93b: Model: Linux 
Box (Device: GPU). Chip: Unknown Chip (Device: GPU). Memory: 3718MB. Flops: 
fp32: 0.00 TFLOPS, fp16: 0.00 TFLOPS, int8: 0.00 TFLOPS, 
975e879a-1838-4c17-ba30-5a3e2d23e7d6: Model: Linux Box (Device: GPU). Chip: 
Unknown Chip (Device: GPU). Memory: 3718MB. Flops: fp32: 0.00 TFLOPS, fp16: 0.00
TFLOPS, int8: 0.00 TFLOPS, 10b6b9a8-c2c2-44f3-9467-be79be5bfce6: Model: Linux 
Box (Device: GPU). Chip: Unknown Chip (Device: GPU). Memory: 3718MB. Flops: 
fp32: 0.00 TFLOPS, fp16: 0.00 TFLOPS, int8: 0.00 TFLOPS, 
2297fcb3-92c0-4c1d-8212-ca9b83fa4752: Model: Linux Box (Device: GPU). Chip: 
Unknown Chip (Device: GPU). Memory: 3718MB. Flops: fp32: 0.00 TFLOPS, fp16: 0.00
TFLOPS, int8: 0.00 TFLOPS, ac8eb7e2-ea9f-4717-a9ed-ccfac4e8f574: Model: Linux 
Box (Device: GPU). Chip: Unknown Chip (Device: GPU). Memory: 3718MB. Flops: 
fp32: 0.00 TFLOPS, fp16: 0.00 TFLOPS, int8: 0.00 TFLOPS}, Edges: 
{f6a3c47c-0abf-42b6-a3ef-41c61f326f95: {'572fe1b3-32ef-4189-bf21-9c668b340c5a', 
'975e879a-1838-4c17-ba30-5a3e2d23e7d6', '1532d6ac-75de-4a97-9061-bfa49159d93b', 
'af84f14c-18ff-4510-937b-c9adbebf2025', 'ac8eb7e2-ea9f-4717-a9ed-ccfac4e8f574', 
'97ce079c-65eb-49af-9afc-5d1dfdda283b', '10b6b9a8-c2c2-44f3-9467-be79be5bfce6', 
'2297fcb3-92c0-4c1d-8212-ca9b83fa4752'}, af84f14c-18ff-4510-937b-c9adbebf2025: 
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 572fe1b3-32ef-4189-bf21-9c668b340c5a: 
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 97ce079c-65eb-49af-9afc-5d1dfdda283b: 
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 1532d6ac-75de-4a97-9061-bfa49159d93b: 
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 975e879a-1838-4c17-ba30-5a3e2d23e7d6: 
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 10b6b9a8-c2c2-44f3-9467-be79be5bfce6: 
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, 2297fcb3-92c0-4c1d-8212-ca9b83fa4752: 
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}, ac8eb7e2-ea9f-4717-a9ed-ccfac4e8f574: 
{'f6a3c47c-0abf-42b6-a3ef-41c61f326f95'}})
Received SendOpaqueStatus request: 
request_id='5930d8a3-36b1-4b78-a91d-3510331dcd3c' status='{"type": 
"node_status", "node_id": "2297fcb3-92c0-4c1d-8212-ca9b83fa4752", "status": 
"start_process_prompt", "base_shard": {"model_id": "llama3-8b-sfr", 
"start_layer": 0, "end_layer": 0, "n_layers": 32}, "shard": {"model_id": 
"llama3-8b-sfr", "start_layer": 21, "end_layer": 23, "n_layers": 32}, "prompt": 
"<|im_start|>user\\nWhat is the meaning of 
exo?<|im_end|>\\n<|im_start|>assistant\\n", "inference_state": null, 
"request_id": "5930d8a3-36b1-4b78-a91d-3510331dcd3c"}'
Received SendOpaqueStatus request: 
request_id='5930d8a3-36b1-4b78-a91d-3510331dcd3c' status='{"type": 
"node_status", "node_id": "1532d6ac-75de-4a97-9061-bfa49159d93b", "status": 
"start_process_prompt", "base_shard": {"model_id": "llama3-8b-sfr", 
"start_layer": 24, "end_layer": 27, "n_layers": 32}, "shard": {"model_id": 
"llama3-8b-sfr", "start_layer": 24, "end_layer": 27, "n_layers": 32}, "prompt": 
"<|im_start|>user\\nWhat is the meaning of 
exo?<|im_end|>\\n<|im_start|>assistant\\n", "inference_state": null, 
"request_id": "5930d8a3-36b1-4b78-a91d-3510331dcd3c"}'
[5930d8a3-36b1-4b78-a91d-3510331dcd3c] process prompt: 
base_shard=Shard(model_id='llama3-8b-sfr', start_layer=0, end_layer=2, 
n_layers=32) shard=Shard(model_id='llama3-8b-sfr', start_layer=0, end_layer=2, 
n_layers=32) prompt='<|im_start|>user\nWhat is the meaning of 
exo?<|im_end|>\n<|im_start|>assistant\n'
opened device NPY from pid:394787
opened device 
DISK:/nasroot/models/Meta-Llama-3-8B/model-00001-of-00004.safetensors from 
pid:394787
*** DISK:/n    1 empty 4976698672 dtypes.uchar          arg   1 mem  0.00 GB 
*** DISK:/n    2 view        8 @ 0                      arg   2 mem  0.00 GB 
opened device CLANG from pid:394787
*** CLANG      3 copy        8,   CLANG <- DISK:/n      arg   2 mem  0.00 GB tm 
19.73ms/    19.73ms (    0.00 GFLOPS,    0.00 GB/s) 
*** DISK:/n    4 view     9512 @ 8                      arg   2 mem  0.00 GB 
*** CLANG      5 copy     9512,   CLANG <- DISK:/n      arg   2 mem  0.00 GB tm 
24.79us/    19.76ms (    0.00 GFLOPS,    0.38 GB/s) 
opened device 
DISK:/nasroot/models/Meta-Llama-3-8B/model-00002-of-00004.safetensors from 
pid:394787
*** DISK:/n    6 empty 4999802720 dtypes.uchar          arg   1 mem  0.00 GB 
*** DISK:/n    7 view        8 @ 0                      arg   2 mem  0.00 GB 
*** CLANG      8 copy        8,   CLANG <- DISK:/n      arg   2 mem  0.00 GB tm 
27.56ms/    47.32ms (    0.00 GFLOPS,    0.00 GB/s) 
*** DISK:/n    9 view    12120 @ 8                      arg   2 mem  0.00 GB 
*** CLANG     10 copy    12120,   CLANG <- DISK:/n      arg   2 mem  0.00 GB tm 
16.04us/    47.34ms (    0.00 GFLOPS,    0.76 GB/s) 
opened device 
DISK:/nasroot/models/Meta-Llama-3-8B/model-00003-of-00004.safetensors from 
pid:394787
*** DISK:/n   11 empty 4915916176 dtypes.uchar          arg   1 mem  0.00 GB 
*** DISK:/n   12 view        8 @ 0                      arg   2 mem  0.00 GB 
*** CLANG     13 copy        8,   CLANG <- DISK:/n      arg   2 mem  0.00 GB tm 
8140.60us/    55.48ms (    0.00 GFLOPS,    0.00 GB/s) 
*** DISK:/n   14 view    11656 @ 8                      arg   2 mem  0.00 GB 
*** CLANG     15 copy    11656,   CLANG <- DISK:/n      arg   2 mem  0.00 GB tm 
17.79us/    55.50ms (    0.00 GFLOPS,    0.66 GB/s) 
opened device 
DISK:/nasroot/models/Meta-Llama-3-8B/model-00004-of-00004.safetensors from 
pid:394787
*** DISK:/n   16 empty 1168138808 dtypes.uchar          arg   1 mem  0.00 GB 
*** DISK:/n   17 view        8 @ 0                      arg   2 mem  0.00 GB 
*** CLANG     18 copy        8,   CLANG <- DISK:/n      arg   2 mem  0.00 GB tm 
13.17ms/    68.66ms (    0.00 GFLOPS,    0.00 GB/s) 
*** DISK:/n   19 view      560 @ 8                      arg   2 mem  0.00 GB 
*** CLANG     20 copy      560,   CLANG <- DISK:/n      arg   2 mem  0.00 GB tm 
16.33us/    68.68ms (    0.00 GFLOPS,    0.03 GB/s) 
  0%|                                                    | 0/31 [00:00<?, ?it/s]
*** DISK:/n   21 view 33554432 @ 1444963632             arg   2 mem  0.00 GB 
*** GPU       22 copy   33.55M,     GPU <- DISK:/n      arg   2 mem  0.03 GB tm 
183.50ms/   252.18ms (    0.00 GFLOPS,    0.18 GB/s) 
ram used:  0.00 GB, layers.0.attention.wq.weight                      :   3%| | 
*** DISK:/n   23 view  8388608 @ 1403020592             arg   2 mem  0.03 GB 
*** GPU       24 copy    8.39M,     GPU <- DISK:/n      arg   2 mem  0.04 GB tm 
54.54ms/   306.72ms (    0.00 GFLOPS,    0.15 GB/s) 
ram used:  0.03 GB, layers.0.attention.wk.weight                      :   6%| | 
*** DISK:/n   25 view  8388608 @ 1478518064             arg   2 mem  0.04 GB 
*** GPU       26 copy    8.39M,     GPU <- DISK:/n      arg   2 mem  0.05 GB tm 
45.24ms/   351.96ms (    0.00 GFLOPS,    0.19 GB/s) 
ram used:  0.04 GB, layers.0.attention.wv.weight                      :  10%| | 
*** DISK:/n   27 view 33554432 @ 1411409200             arg   2 mem  0.05 GB 
*** GPU       28 copy   33.55M,     GPU <- DISK:/n      arg   2 mem  0.08 GB tm 
177.18ms/   529.15ms (    0.00 GFLOPS,    0.19 GB/s) 
ram used:  0.05 GB, layers.0.attention.wo.weight                      :  13%|▏| 
*** DISK:/n   29 view 117440512 @ 1168131376            arg   2 mem  0.08 GB 
*** GPU       30 copy  117.44M,     GPU <- DISK:/n      arg   2 mem  0.20 GB tm 
625.08ms/  1154.22ms (    0.00 GFLOPS,    0.19 GB/s) 
ram used:  0.08 GB, layers.0.feed_forward.w1.weight                   :  16%|▏| 
*** DISK:/n   31 view 117440512 @ 1050690864            arg   2 mem  0.20 GB 
*** GPU       32 copy  117.44M,     GPU <- DISK:/n      arg   2 mem  0.32 GB tm 
604.83ms/  1759.05ms (    0.00 GFLOPS,    0.19 GB/s) 
ram used:  0.20 GB, layers.0.feed_forward.w2.weight                   :  19%|▏| 
*** DISK:/n   33 view 117440512 @ 1285571888            arg   2 mem  0.32 GB 
*** GPU       34 copy  117.44M,     GPU <- DISK:/n      arg   2 mem  0.44 GB tm 
635.25ms/  2394.31ms (    0.00 GFLOPS,    0.18 GB/s) 
ram used:  0.32 GB, layers.0.feed_forward.w3.weight                   :  23%|▏| 
*** DISK:/n   35 view     8192 @ 1050682672             arg   2 mem  0.44 GB 
*** GPU       36 copy     8192,     GPU <- DISK:/n      arg   2 mem  0.44 GB tm 
338.91us/  2394.65ms (    0.00 GFLOPS,    0.02 GB/s) 
ram used:  0.44 GB, layers.0.attention_norm.weight                    :  26%|▎| 
*** DISK:/n   37 view     8192 @ 1403012400             arg   2 mem  0.44 GB 
*** GPU       38 copy     8192,     GPU <- DISK:/n      arg   2 mem  0.44 GB tm 
207.96us/  2394.86ms (    0.00 GFLOPS,    0.04 GB/s) 
ram used:  0.44 GB, layers.0.ffn_norm.weight                          :  29%|▎| 
*** DISK:/n   39 view 33554432 @ 1881187632             arg   2 mem  0.44 GB 
*** GPU       40 copy   33.55M,     GPU <- DISK:/n      arg   2 mem  0.47 GB tm 
209.94ms/  2604.80ms (    0.00 GFLOPS,    0.16 GB/s) 
ram used:  0.44 GB, layers.1.attention.wq.weight                      :  32%|▎| 
*** DISK:/n   41 view  8388608 @ 1839244592             arg   2 mem  0.47 GB 
*** GPU       42 copy    8.39M,     GPU <- DISK:/n      arg   2 mem  0.48 GB tm 
49.55ms/  2654.34ms (    0.00 GFLOPS,    0.17 GB/s) 
ram used:  0.47 GB, layers.1.attention.wk.weight                      :  35%|▎| 
*** DISK:/n   43 view  8388608 @ 1914742064             arg   2 mem  0.48 GB 
*** GPU       44 copy    8.39M,     GPU <- DISK:/n      arg   2 mem  0.49 GB tm 
49.60ms/  2703.94ms (    0.00 GFLOPS,    0.17 GB/s) 
ram used:  0.48 GB, layers.1.attention.wv.weight                      :  39%|▍| 
*** DISK:/n   45 view 33554432 @ 1847633200             arg   2 mem  0.49 GB 
*** GPU       46 copy   33.55M,     GPU <- DISK:/n      arg   2 mem  0.52 GB tm 
168.08ms/  2872.02ms (    0.00 GFLOPS,    0.20 GB/s) 
ram used:  0.49 GB, layers.1.attention.wo.weight                      :  42%|▍| 
*** DISK:/n   47 view 117440512 @ 1604355376            arg   2 mem  0.52 GB 
*** GPU       48 copy  117.44M,     GPU <- DISK:/n      arg   2 mem  0.64 GB tm 
660.89ms/  3532.91ms (    0.00 GFLOPS,    0.18 GB/s) 
ram used:  0.52 GB, layers.1.feed_forward.w1.weight                   :  45%|▍| 
*** DISK:/n   49 view 117440512 @ 1486914864            arg   2 mem  0.64 GB 
*** GPU       50 copy  117.44M,     GPU <- DISK:/n      arg   2 mem  0.75 GB tm 
592.79ms/  4125.70ms (    0.00 GFLOPS,    0.20 GB/s) 
ram used:  0.64 GB, layers.1.feed_forward.w2.weight                   :  48%|▍| 
*** DISK:/n   51 view 117440512 @ 1721795888            arg   2 mem  0.75 GB 
*** GPU       52 copy  117.44M,     GPU <- DISK:/n      arg   2 mem  0.87 GB tm 
596.22ms/  4721.92ms (    0.00 GFLOPS,    0.20 GB/s) 
ram used:  0.75 GB, layers.1.feed_forward.w3.weight                   :  52%|▌| 
*** DISK:/n   53 view     8192 @ 1486906672             arg   2 mem  0.87 GB 
*** GPU       54 copy     8192,     GPU <- DISK:/n      arg   2 mem  0.87 GB tm 
280.87us/  4722.20ms (    0.00 GFLOPS,    0.03 GB/s) 
ram used:  0.87 GB, layers.1.attention_norm.weight                    :  55%|▌| 
*** DISK:/n   55 view     8192 @ 1839236400             arg   2 mem  0.87 GB 
*** GPU       56 copy     8192,     GPU <- DISK:/n      arg   2 mem  0.87 GB tm 
254.33us/  4722.45ms (    0.00 GFLOPS,    0.03 GB/s) 
ram used:  0.87 GB, layers.1.ffn_norm.weight                          :  58%|▌| 
*** DISK:/n   57 view 33554432 @ 2317411632             arg   2 mem  0.87 GB 
*** GPU       58 copy   33.55M,     GPU <- DISK:/n      arg   2 mem  0.91 GB tm 
187.20ms/  4909.65ms (    0.00 GFLOPS,    0.18 GB/s) 
ram used:  0.87 GB, layers.2.attention.wq.weight                      :  61%|▌| 
*** DISK:/n   59 view  8388608 @ 2275468592             arg   2 mem  0.91 GB 
*** GPU       60 copy    8.39M,     GPU <- DISK:/n      arg   2 mem  0.91 GB tm 
47.43ms/  4957.08ms (    0.00 GFLOPS,    0.18 GB/s) 
ram used:  0.91 GB, layers.2.attention.wk.weight                      :  65%|▋| 
*** DISK:/n   61 view  8388608 @ 2350966064             arg   2 mem  0.91 GB 
*** GPU       62 copy    8.39M,     GPU <- DISK:/n      arg   2 mem  0.92 GB tm 
48.27ms/  5005.35ms (    0.00 GFLOPS,    0.17 GB/s) 
ram used:  0.91 GB, layers.2.attention.wv.weight                      :  68%|▋| 
*** DISK:/n   63 view 33554432 @ 2283857200             arg   2 mem  0.92 GB 
*** GPU       64 copy   33.55M,     GPU <- DISK:/n      arg   2 mem  0.96 GB tm 
186.63ms/  5191.99ms (    0.00 GFLOPS,    0.18 GB/s) 
ram used:  0.92 GB, layers.2.attention.wo.weight                      :  71%|▋| 
*** DISK:/n   65 view 117440512 @ 2040579376            arg   2 mem  0.96 GB 
*** GPU       66 copy  117.44M,     GPU <- DISK:/n      arg   2 mem  1.07 GB tm 
594.12ms/  5786.11ms (    0.00 GFLOPS,    0.20 GB/s) 
ram used:  0.96 GB, layers.2.feed_forward.w1.weight                   :  74%|▋| 
*** DISK:/n   67 view 117440512 @ 1923138864            arg   2 mem  1.07 GB 
*** GPU       68 copy  117.44M,     GPU <- DISK:/n      arg   2 mem  1.19 GB tm 
619.61ms/  6405.72ms (    0.00 GFLOPS,    0.19 GB/s) 
ram used:  1.07 GB, layers.2.feed_forward.w2.weight                   :  77%|▊| 
*** DISK:/n   69 view 117440512 @ 2158019888            arg   2 mem  1.19 GB 
*** GPU       70 copy  117.44M,     GPU <- DISK:/n      arg   2 mem  1.31 GB tm 
646.90ms/  7052.61ms (    0.00 GFLOPS,    0.18 GB/s) 
ram used:  1.19 GB, layers.2.feed_forward.w3.weight                   :  81%|▊| 
*** DISK:/n   71 view     8192 @ 1923130672             arg   2 mem  1.31 GB 
*** GPU       72 copy     8192,     GPU <- DISK:/n      arg   2 mem  1.31 GB tm 
12.68ms/  7065.30ms (    0.00 GFLOPS,    0.00 GB/s) 
ram used:  1.31 GB, layers.2.attention_norm.weight                    :  84%|▊| 
*** DISK:/n   73 view     8192 @ 2275460400             arg   2 mem  1.31 GB 
*** GPU       74 copy     8192,     GPU <- DISK:/n      arg   2 mem  1.31 GB tm 
207.37us/  7065.50ms (    0.00 GFLOPS,    0.04 GB/s) 
ram used:  1.31 GB, layers.2.ffn_norm.weight                          :  87%|▊| 
*** DISK:/n   75 view     8192 @ 1168130616             arg   2 mem  1.31 GB 
*** GPU       76 copy     8192,     GPU <- DISK:/n      arg   2 mem  1.31 GB tm 
9635.95us/  7075.14ms (    0.00 GFLOPS,    0.00 GB/s) 
ram used:  1.31 GB, norm.weight                                       :  90%|▉| 
*** DISK:/n   77 view 1050673152 @ 9520                 arg   2 mem  1.31 GB 
*** GPU       78 copy 1050.67M,     GPU <- DISK:/n      arg   2 mem  2.36 GB tm 
5481.18ms/ 12556.32ms (    0.00 GFLOPS,    0.19 GB/s) 
ram used:  1.31 GB, tok_embeddings.weight                             :  94%|▉| 
*** DISK:/n   79 view 1050673152 @ 568                  arg   2 mem  2.36 GB 
╭─────────────────────────── Exo Cluster (9 nodes) ────────────────────────────╮
│                                                                              │
│                                                            _____  _____      │
│                                                           / _ \ \/ / _ \     │
│                                                          |  __/>  < (_) |    │
│                                                           \___/_/\_\___/     │
│                                                                              │
│                                                                              │
│                                           Web Chat URL (tinychat):           │
│ http://localhost:8000                                                        │
│                                  ChatGPT API endpoint:                       │
│ http://localhost:8000/v1/chat/completions                                    │
│                          GPU poor  ▼                                         │
│ GPU rich                                                                     │
│                                   [🟥🟥🟥🟥🟥🟥🟥🟥🟧🟧🟧🟧🟧🟧🟧🟨🟨🟨🟨🟨  │
│ 🟨🟨🟨🟩🟩🟩🟩🟩🟩🟩]                                                        │
│                               0.00 TFLOPS                                    │
│                                    ▲                                         │
│                                                             Linux Box        │
│ (Device: GPU) 3GB                                                            │
│                                                             0TFLOPS          │
│           Linux Box (Device: GPU) 3GB                       [0.78-0.89]      │
│           0TFLOPS                  --------------------🔴----                │
│           [0.67-0.78]           --🔵                         ----            │
                                      ...                                       已杀死

artistlu avatar Aug 01 '24 01:08 artistlu