exo icon indicating copy to clipboard operation
exo copied to clipboard

Jetson devices reload the model and the inference speed is too slow (llama3.2-1b, 8bit)

Open Mr-lwd opened this issue 9 months ago • 5 comments

ram used: 0.14 GB, layers.1.feed_forward.w1.weight : 9%|██████▌ | 14/148 [00:00<00:03, 33.65it/s] ram used: 0.18 GB, layers.1.feed_forward.w2.weight : 10%|██████▉ | 15/148 [00:00<00:03, 33.42it/s] ram used: 0.21 GB, layers.1.feed_forward.w3.weight : 11%|███████▍ | 16/148 [00:00<00:03, 33.22it/s] ram used: 0.24 GB, layers.1.attention_norm.weight : 11%|███████▉ | 17/148 [00:00<00:03, 34.33it/s] ram used: 0.24 GB, layers.1.ffn_norm.weight : 12%|████████▍ | 18/148 [00:00<00:03, 35.36it/s] ram used: 0.24 GB, layers.2.attention.wq.weight : 13%|████████▊ | 19/148 [00:00<00:03, 35.89it/s] ram used: 0.25 GB, layers.2.attention.wk.weight : 14%|█████████▎ | 20/148 [00:00<00:03, 36.66it/s] ram used: 0.25 GB, layers.2.attention.wv.weight : 14%|█████████▊ | 21/148 [00:00<00:03, 37.49it/s] ram used: 0.26 GB, layers.2.attention.wo.weight : 15%|██████████▎ | 22/148 [00:00<00:03, 38.04it/s] ram used: 0.26 GB, layers.2.feed_forward.w1.weight : 16%|██████████▋ | 23/148 [00:00<00:03, 37.67it/s] ram used: 0.30 GB, layers.2.feed_forward.w2.weight : 16%|███████████▏ | 24/148 [00:00<00:03, 37.34it/s] ram used: 0.33 GB, layers.2.feed_forward.w3.weight : 17%|███████████▋ | 25/148 [00:00<00:03, 37.04it/s] ram used: 0.36 GB, layers.2.attention_norm.weight : 18%|████████████ | 26/148 [00:00<00:03, 37.84it/s] ram used: 0.36 GB, layers.2.ffn_norm.weight : 18%|████████████▌ | 27/148 [00:00<00:03, 38.66it/s] ram used: 0.36 GB, layers.3.attention.wq.weight : 19%|█████████████ | 28/148 [00:00<00:03, 39.05it/s] ram used: 0.37 GB, layers.3.attention.wk.weight : 20%|█████████████▌ | 29/148 [00:00<00:03, 39.63it/s] ram used: 0.38 GB, layers.3.attention.wv.weight : 20%|█████████████▉ | 30/148 [00:00<00:02, 40.22it/s] ram used: 0.38 GB, layers.3.attention.wo.weight : 21%|██████████████▍ | 31/148 [00:00<00:02, 40.60it/s] ram used: 0.39 GB, layers.3.feed_forward.w1.weight : 22%|██████████████▉ | 32/148 [00:00<00:02, 40.21it/s] ram used: 0.42 GB, layers.3.feed_forward.w2.weight : 22%|███████████████▍ | 33/148 [00:00<00:02, 39.87it/s] ram used: 0.45 GB, layers.3.feed_forward.w3.weight : 23%|███████████████▊ | 34/148 [00:00<00:02, 39.55it/s] ram used: 0.49 GB, layers.3.attention_norm.weight : 24%|████████████████▎ | 35/148 [00:00<00:02, 40.15it/s] ram used: 0.49 GB, layers.3.ffn_norm.weight : 24%|████████████████▊ | 36/148 [00:00<00:02, 40.76it/s] ram used: 0.49 GB, layers.4.attention.wq.weight : 25%|█████████████████▎ | 37/148 [00:00<00:02, 41.06it/s] ram used: 0.49 GB, layers.4.attention.wk.weight : 26%|█████████████████▋ | 38/148 [00:00<00:02, 41.50it/s] ram used: 0.50 GB, layers.4.attention.wv.weight : 26%|██████████████████▏ | 39/148 [00:00<00:02, 41.95it/s] ram used: 0.50 GB, layers.4.attention.wo.weight : 27%|██████████████████▋ | 40/148 [00:00<00:02, 42.22it/s] ram used: 0.51 GB, layers.4.feed_forward.w1.weight : 28%|███████████████████ | 41/148 [00:00<00:02, 41.82it/s] ram used: 0.54 GB, layers.4.feed_forward.w2.weight : 28%|███████████████████▌ | 42/148 [00:01<00:02, 41.45it/s] ram used: 0.57 GB, layers.4.feed_forward.w3.weight : 29%|████████████████████ | 43/148 [00:01<00:02, 41.11it/s] ram used: 0.61 GB, layers.4.attention_norm.weight : 30%|████████████████████▌ | 44/148 [00:01<00:02, 41.58it/s] ram used: 0.61 GB, layers.4.ffn_norm.weight : 30%|████████████████████▉ | 45/148 [00:01<00:02, 42.06it/s] ram used: 0.61 GB, layers.5.attention.wq.weight : 31%|█████████████████████▍ | 46/148 [00:01<00:02, 42.28it/s] ram used: 0.62 GB, layers.5.attention.wk.weight : 32%|█████████████████████▉ | 47/148 [00:01<00:02, 42.62it/s] ram used: 0.62 GB, layers.5.attention.wv.weight : 32%|██████████████████████▍ | 48/148 [00:01<00:02, 42.98it/s] ram used: 0.62 GB, layers.5.attention.wo.weight : 33%|██████████████████████▊ | 49/148 [00:01<00:02, 43.18it/s] ram used: 0.63 GB, layers.5.feed_forward.w1.weight : 34%|███████████████████████▎ | 50/148 [00:01<00:02, 42.84it/s] ram used: 0.66 GB, layers.5.feed_forward.w2.weight : 34%|███████████████████████▊ | 51/148 [00:01<00:02, 42.51it/s] ram used: 0.70 GB, layers.5.feed_forward.w3.weight : 35%|████████████████████████▏ | 52/148 [00:01<00:02, 42.21it/s] ram used: 0.73 GB, layers.5.attention_norm.weight : 36%|████████████████████████▋ | 53/148 [00:01<00:02, 42.60it/s] ram used: 0.73 GB, layers.5.ffn_norm.weight : 36%|█████████████████████████▏ | 54/148 [00:01<00:02, 43.00it/s] ram used: 0.73 GB, layers.6.attention.wq.weight : 37%|█████████████████████████▋ | 55/148 [00:01<00:02, 43.18it/s] ram used: 0.74 GB, layers.6.attention.wk.weight : 38%|██████████████████████████ | 56/148 [00:01<00:02, 43.47it/s] ram used: 0.74 GB, layers.6.attention.wv.weight : 39%|██████████████████████████▌ | 57/148 [00:01<00:02, 43.77it/s] ram used: 0.74 GB, layers.6.attention.wo.weight : 39%|███████████████████████████ | 58/148 [00:01<00:02, 43.93it/s] ram used: 0.75 GB, layers.6.feed_forward.w1.weight : 40%|███████████████████████████▌ | 59/148 [00:01<00:02, 43.63it/s] ram used: 0.78 GB, layers.6.feed_forward.w2.weight : 41%|███████████████████████████▉ | 60/148 [00:01<00:02, 43.34it/s] ram used: 0.82 GB, layers.6.feed_forward.w3.weight : 41%|████████████████████████████▍ | 61/148 [00:01<00:02, 43.05it/s] ram used: 0.85 GB, layers.6.attention_norm.weight : 42%|████████████████████████████▉ | 62/148 [00:01<00:01, 43.36it/s] ram used: 0.85 GB, layers.6.ffn_norm.weight : 43%|█████████████████████████████▎ | 63/148 [00:01<00:01, 43.69it/s] ram used: 0.85 GB, layers.7.attention.wq.weight : 43%|█████████████████████████████▊ | 64/148 [00:01<00:01, 43.84it/s] ram used: 0.86 GB, layers.7.attention.wk.weight : 44%|██████████████████████████████▎ | 65/148 [00:01<00:01, 44.09it/s] ram used: 0.86 GB, layers.7.attention.wv.weight : 45%|██████████████████████████████▊ | 66/148 [00:01<00:01, 44.34it/s] ram used: 0.86 GB, layers.7.attention.wo.weight : 45%|███████████████████████████████▏ | 67/148 [00:01<00:01, 44.47it/s] ram used: 0.87 GB, layers.7.feed_forward.w1.weight : 46%|███████████████████████████████▋ | 68/148 [00:01<00:01, 44.13it/s] ram used: 0.91 GB, layers.7.feed_forward.w2.weight : 47%|████████████████████████████████▏ | 69/148 [00:01<00:01, 43.87it/s] ram used: 0.94 GB, layers.7.feed_forward.w3.weight : 47%|████████████████████████████████▋ | 70/148 [00:01<00:01, 43.53it/s] ram used: 0.97 GB, layers.7.attention_norm.weight : 48%|█████████████████████████████████ | 71/148 [00:01<00:01, 43.81it/s] ram used: 0.97 GB, layers.7.ffn_norm.weight : 49%|█████████████████████████████████▌ | 72/148 [00:01<00:01, 44.12it/s] ram used: 0.97 GB, layers.8.attention.wq.weight : 49%|██████████████████████████████████ | 73/148 [00:01<00:01, 44.12it/s] ram used: 0.98 GB, layers.8.attention.wk.weight : 50%|██████████████████████████████████▌ | 74/148 [00:01<00:01, 44.29it/s] ram used: 0.98 GB, layers.8.attention.wv.weight : 51%|██████████████████████████████████▉ | 75/148 [00:01<00:01, 44.48it/s] ram used: 0.99 GB, layers.8.attention.wo.weight : 51%|███████████████████████████████████▍ | 76/148 [00:01<00:01, 44.46it/s] ram used: 0.99 GB, layers.8.feed_forward.w1.weight : 52%|███████████████████████████████████▉ | 77/148 [00:01<00:01, 43.75it/s] ram used: 1.03 GB, layers.8.feed_forward.w2.weight : 53%|████████████████████████████████████▎ | 78/148 [00:01<00:01, 43.11it/s] ram used: 1.06 GB, layers.8.feed_forward.w3.weight : 53%|████████████████████████████████████▊ | 79/148 [00:01<00:01, 42.50it/s] ram used: 1.09 GB, layers.8.attention_norm.weight : 54%|█████████████████████████████████████▎ | 80/148 [00:01<00:01, 42.76it/s] ram used: 1.09 GB, layers.8.ffn_norm.weight : 55%|█████████████████████████████████████▊ | 81/148 [00:01<00:01, 43.02it/s] ram used: 1.09 GB, layers.9.attention.wq.weight : 55%|██████████████████████████████████████▏ | 82/148 [00:01<00:01, 43.04it/s] ram used: 1.10 GB, layers.9.attention.wk.weight : 56%|██████████████████████████████████████▋ | 83/148 [00:01<00:01, 43.21it/s] ram used: 1.11 GB, layers.9.attention.wv.weight : 57%|███████████████████████████████████████▏ | 84/148 [00:01<00:01, 43.37it/s] ram used: 1.11 GB, layers.9.attention.wo.weight : 57%|███████████████████████████████████████▋ | 85/148 [00:01<00:01, 43.15it/s] ram used: 1.12 GB, layers.9.feed_forward.w1.weight : 58%|████████████████████████████████████████ | 86/148 [00:02<00:01, 42.58it/s] ram used: 1.15 GB, layers.9.feed_forward.w2.weight : 59%|████████████████████████████████████████▌ | 87/148 [00:02<00:01, 42.03it/s] ram used: 1.18 GB, layers.9.feed_forward.w3.weight : 59%|█████████████████████████████████████████ | 88/148 [00:02<00:01, 41.52it/s] ram used: 1.22 GB, layers.9.attention_norm.weight : 60%|█████████████████████████████████████████▍ | 89/148 [00:02<00:01, 41.75it/s] ram used: 1.22 GB, layers.9.ffn_norm.weight : 61%|█████████████████████████████████████████▉ | 90/148 [00:02<00:01, 41.99it/s] ram used: 1.22 GB, layers.10.attention.wq.weight : 61%|██████████████████████████████████████████▍ | 91/148 [00:02<00:01, 42.30it/s] ram used: 1.22 GB, layers.10.attention.wk.weight : 62%|██████████████████████████████████████████▉ | 92/148 [00:02<00:01, 42.60it/s] ram used: 1.22 GB, layers.10.attention.wv.weight : 63%|███████████████████████████████████████████▎ | 93/148 [00:02<00:01, 42.90it/s] ram used: 1.22 GB, layers.10.attention.wo.weight : 64%|███████████████████████████████████████████▊ | 94/148 [00:02<00:01, 43.20it/s] ram used: 1.22 GB, layers.10.feed_forward.w1.weight : 64%|████████████████████████████████████████████▎ | 95/148 [00:02<00:01, 43.50it/s] ram used: 1.22 GB, layers.10.feed_forward.w2.weight : 65%|████████████████████████████████████████████▊ | 96/148 [00:02<00:01, 43.79it/s] ram used: 1.22 GB, layers.10.feed_forward.w3.weight : 66%|█████████████████████████████████████████████▏ | 97/148 [00:02<00:01, 44.08it/s] ram used: 1.22 GB, layers.10.attention_norm.weight : 66%|█████████████████████████████████████████████▋ | 98/148 [00:02<00:01, 44.38it/s] ram used: 1.22 GB, layers.10.ffn_norm.weight : 67%|██████████████████████████████████████████████▏ | 99/148 [00:02<00:01, 44.67it/s] ram used: 1.22 GB, layers.11.attention.wq.weight : 68%|█████████████████████████████████████████████▉ | 100/148 [00:02<00:01, 44.95it/s] ram used: 1.22 GB, layers.11.attention.wk.weight : 68%|██████████████████████████████████████████████▍ | 101/148 [00:02<00:01, 45.24it/s] ram used: 1.22 GB, layers.11.attention.wv.weight : 69%|██████████████████████████████████████████████▊ | 102/148 [00:02<00:01, 45.52it/s] ram used: 1.22 GB, layers.11.attention.wo.weight : 70%|███████████████████████████████████████████████▎ | 103/148 [00:02<00:00, 45.81it/s] ram used: 1.22 GB, layers.11.feed_forward.w1.weight : 70%|███████████████████████████████████████████████▊ | 104/148 [00:02<00:00, 46.09it/s] ram used: 1.22 GB, layers.11.feed_forward.w2.weight : 71%|████████████████████████████████████████████████▏ | 105/148 [00:02<00:00, 46.37it/s] ram used: 1.22 GB, layers.11.feed_forward.w3.weight : 72%|████████████████████████████████████████████████▋ | 106/148 [00:02<00:00, 46.64it/s] ram used: 1.22 GB, layers.11.attention_norm.weight : 72%|█████████████████████████████████████████████████▏ | 107/148 [00:02<00:00, 46.92it/s] ram used: 1.22 GB, layers.11.ffn_norm.weight : 73%|█████████████████████████████████████████████████▌ | 108/148 [00:02<00:00, 47.19it/s] ram used: 1.22 GB, layers.12.attention.wq.weight : 74%|██████████████████████████████████████████████████ | 109/148 [00:02<00:00, 47.46it/s] ram used: 1.22 GB, layers.12.attention.wk.weight : 74%|██████████████████████████████████████████████████▌ | 110/148 [00:02<00:00, 47.73it/s] ram used: 1.22 GB, layers.12.attention.wv.weight : 75%|███████████████████████████████████████████████████ | 111/148 [00:02<00:00, 48.00it/s] ram used: 1.22 GB, layers.12.attention.wo.weight : 76%|███████████████████████████████████████████████████▍ | 112/148 [00:02<00:00, 48.27it/s] ram used: 1.22 GB, layers.12.feed_forward.w1.weight : 76%|███████████████████████████████████████████████████▉ | 113/148 [00:02<00:00, 48.53it/s] ram used: 1.22 GB, layers.12.feed_forward.w2.weight : 77%|████████████████████████████████████████████████████▍ | 114/148 [00:02<00:00, 48.79it/s] ram used: 1.22 GB, layers.12.feed_forward.w3.weight : 78%|████████████████████████████████████████████████████▊ | 115/148 [00:02<00:00, 49.05it/s] ram used: 1.22 GB, layers.12.attention_norm.weight : 78%|█████████████████████████████████████████████████████▎ | 116/148 [00:02<00:00, 49.31it/s] ram used: 1.22 GB, layers.12.ffn_norm.weight : 79%|█████████████████████████████████████████████████████▊ | 117/148 [00:02<00:00, 49.57it/s] ram used: 1.22 GB, layers.13.attention.wq.weight : 80%|██████████████████████████████████████████████████████▏ | 118/148 [00:02<00:00, 49.82it/s] ram used: 1.22 GB, layers.13.attention.wk.weight : 80%|██████████████████████████████████████████████████████▋ | 119/148 [00:02<00:00, 50.08it/s] ram used: 1.22 GB, layers.13.attention.wv.weight : 81%|███████████████████████████████████████████████████████▏ | 120/148 [00:02<00:00, 50.33it/s] ram used: 1.22 GB, layers.13.attention.wo.weight : 82%|███████████████████████████████████████████████████████▌ | 121/148 [00:02<00:00, 50.58it/s] ram used: 1.22 GB, layers.13.feed_forward.w1.weight : 82%|████████████████████████████████████████████████████████ | 122/148 [00:02<00:00, 50.83it/s] ram used: 1.22 GB, layers.13.feed_forward.w2.weight : 83%|████████████████████████████████████████████████████████▌ | 123/148 [00:02<00:00, 51.07it/s] ram used: 1.22 GB, layers.13.feed_forward.w3.weight : 84%|████████████████████████████████████████████████████████▉ | 124/148 [00:02<00:00, 51.32it/s] ram used: 1.22 GB, layers.13.attention_norm.weight : 84%|█████████████████████████████████████████████████████████▍ | 125/148 [00:02<00:00, 51.56it/s] ram used: 1.22 GB, layers.13.ffn_norm.weight : 85%|█████████████████████████████████████████████████████████▉ | 126/148 [00:02<00:00, 51.80it/s] ram used: 1.22 GB, layers.14.attention.wq.weight : 86%|██████████████████████████████████████████████████████████▎ | 127/148 [00:02<00:00, 52.04it/s] ram used: 1.22 GB, layers.14.attention.wk.weight : 86%|██████████████████████████████████████████████████████████▊ | 128/148 [00:02<00:00, 52.28it/s] ram used: 1.22 GB, layers.14.attention.wv.weight : 87%|███████████████████████████████████████████████████████████▎ | 129/148 [00:02<00:00, 52.52it/s] ram used: 1.22 GB, layers.14.attention.wo.weight : 88%|███████████████████████████████████████████████████████████▋ | 130/148 [00:02<00:00, 52.76it/s] ram used: 1.22 GB, layers.14.feed_forward.w1.weight : 89%|████████████████████████████████████████████████████████████▏ | 131/148 [00:02<00:00, 52.99it/s] ram used: 1.22 GB, layers.14.feed_forward.w2.weight : 89%|████████████████████████████████████████████████████████████▋ | 132/148 [00:02<00:00, 53.23it/s] ram used: 1.22 GB, layers.14.feed_forward.w3.weight : 90%|█████████████████████████████████████████████████████████████ | 133/148 [00:02<00:00, 53.46it/s] ram used: 1.22 GB, layers.14.attention_norm.weight : 91%|█████████████████████████████████████████████████████████████▌ | 134/148 [00:02<00:00, 53.69it/s] ram used: 1.22 GB, layers.14.ffn_norm.weight : 91%|██████████████████████████████████████████████████████████████ | 135/148 [00:02<00:00, 53.91it/s] ram used: 1.22 GB, layers.15.attention.wq.weight : 92%|██████████████████████████████████████████████████████████████▍ | 136/148 [00:02<00:00, 54.14it/s] ram used: 1.22 GB, layers.15.attention.wk.weight : 93%|██████████████████████████████████████████████████████████████▉ | 137/148 [00:02<00:00, 54.34it/s] ram used: 1.22 GB, layers.15.attention.wv.weight : 93%|███████████████████████████████████████████████████████████████▍ | 138/148 [00:02<00:00, 54.56it/s] ram used: 1.22 GB, layers.15.attention.wo.weight : 94%|███████████████████████████████████████████████████████████████▊ | 139/148 [00:02<00:00, 54.76it/s] ram used: 1.22 GB, layers.15.feed_forward.w1.weight : 95%|████████████████████████████████████████████████████████████████▎ | 140/148 [00:02<00:00, 54.97it/s] ram used: 1.22 GB, layers.15.feed_forward.w2.weight : 95%|████████████████████████████████████████████████████████████████▊ | 141/148 [00:02<00:00, 55.14it/s] ram used: 1.22 GB, layers.15.feed_forward.w3.weight : 96%|█████████████████████████████████████████████████████████████████▏ | 142/148 [00:02<00:00, 55.36it/s] ram used: 1.22 GB, layers.15.attention_norm.weight : 97%|█████████████████████████████████████████████████████████████████▋ | 143/148 [00:02<00:00, 55.57it/s] ram used: 1.22 GB, layers.15.ffn_norm.weight : 97%|██████████████████████████████████████████████████████████████████▏ | 144/148 [00:02<00:00, 55.78it/s] ram used: 1.22 GB, norm.weight : 98%|██████████████████████████████████████████████████████████████████▌ | 145/148 [00:02<00:00, 55.90it/s] Download error on attempt 4/30 for repo_id='TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R' revision='main' path='model.safetensors.index.json' target_dir=PosixPath('/tmp/exo/TriAiExperiments--SFR-Iterative-DPO-LLaMA-3-70B-R') Traceback (most recent call last): File "/home/jetnx/Desktop/llm/exo/exo/download/new_shard_download.py", line 134, in download_file_with_retry try: return await _download_file(repo_id, revision, path, target_dir, on_progress) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jetnx/Desktop/llm/exo/exo/download/new_shard_download.py", line 156, in _download_file assert r.status in [200, 206], f"Failed to download {path} from {url}: {r.status}" ^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Failed to download model.safetensors.index.json from https://hf-mirror.com/TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R/resolve/main/model.safetensors.index.json: 401 ram used: 1.22 GB, tok_embeddings.weight : 99%|███████████████████████████████████████████████████████████████████ | 146/148 [00:03<00:00, 43.26it/s] ram used: 1.74 GB, output.weight : 99%|███████████████████████████████████████████████████████████████████▌| 147/148 [00:03<00:00, 43.43it/s] ram used: 1.74 GB, freqs_cis : 100%|████████████████████████████████████████████████████████████████████| 148/148 [00:03<00:00, 43.63it/s] ram used: 1.74 GB, freqs_cis : 100%|████████████████████████████████████████████████████████████████████| 148/148 [00:03<00:00, 43.52it/s] loaded weights in 3409.17 ms, 1.74 GB loaded at 0.51 GB/s 0%| | 0/148 [00:00<?, ?it/s] ram used: 1.74 GB, layers.0.attention.wq.weight : 1%|▍ | 1/148 [00:00<00:03, 42.65it/s] ram used: 1.75 GB, layers.0.attention.wk.weight : 1%|▉ | 2/148 [00:00<00:02, 49.67it/s] ram used: 1.75 GB, layers.0.attention.wv.weight : 2%|█▍ | 3/148 [00:00<00:02, 53.50it/s] ram used: 1.75 GB, layers.0.attention.wo.weight : 3%|█▉ | 4/148 [00:00<00:02, 50.39it/s] ram used: 1.76 GB, layers.0.feed_forward.w1.weight : 3%|██▎ | 5/148 [00:00<00:03, 38.54it/s] ram used: 1.80 GB, layers.0.feed_forward.w2.weight : 4%|██▊ | 6/148 [00:00<00:04, 33.33it/s] ram used: 1.83 GB, layers.0.feed_forward.w3.weight : 5%|███▎ | 7/148 [00:00<00:04, 30.32it/s] ram used: 1.86 GB, layers.0.attention_norm.weight : 5%|███▊ | 8/148 [00:00<00:04, 32.85it/s] ram used: 1.86 GB, layers.0.ffn_norm.weight : 6%|████▎ | 9/148 [00:00<00:03, 35.29it/s] ram used: 1.86 GB, layers.1.attention.wq.weight : 7%|████▋ | 10/148 [00:00<00:03, 35.99it/s] ram used: 1.87 GB, layers.1.attention.wk.weight : 7%|█████▏ | 11/148 [00:00<00:03, 37.40it/s] ram used: 1.87 GB, layers.1.attention.wv.weight : 8%|█████▌ | 12/148 [00:00<00:03, 38.74it/s] ram used: 1.88 GB, layers.1.attention.wo.weight : 9%|██████ | 13/148 [00:00<00:03, 39.08it/s] ram used: 1.88 GB, layers.1.feed_forward.w1.weight : 9%|██████▌ | 14/148 [00:00<00:03, 36.56it/s] ram used: 1.92 GB, layers.1.feed_forward.w2.weight : 10%|██████▉ | 15/148 [00:00<00:03, 34.60it/s] ram used: 1.95 GB, layers.1.feed_forward.w3.weight : 11%|███████▍ | 16/148 [00:00<00:03, 33.05it/s] ram used: 1.99 GB, layers.1.attention_norm.weight : 11%|███████▉ | 17/148 [00:00<00:03, 34.23it/s] ram used: 1.99 GB, layers.1.ffn_norm.weight : 12%|████████▍ | 18/148 [00:00<00:03, 35.40it/s] ram used: 1.99 GB, layers.2.attention.wq.weight : 13%|████████▊ | 19/148 [00:00<00:03, 35.77it/s] ram used: 1.99 GB, layers.2.attention.wk.weight : 14%|█████████▎ | 20/148 [00:00<00:03, 36.55it/s] ram used: 2.00 GB, layers.2.attention.wv.weight : 14%|█████████▊ | 21/148 [00:00<00:03, 37.30it/s] ram used: 2.00 GB, layers.2.attention.wo.weight : 15%|██████████▎ | 22/148 [00:00<00:03, 37.56it/s] ram used: 2.01 GB, layers.2.feed_forward.w1.weight : 16%|██████████▋ | 23/148 [00:00<00:03, 36.20it/s] ram used: 2.04 GB, layers.2.feed_forward.w2.weight : 16%|███████████▏ | 24/148 [00:00<00:03, 35.03it/s] ram used: 2.07 GB, layers.2.feed_forward.w3.weight : 17%|███████████▋ | 25/148 [00:00<00:03, 34.02it/s] ram used: 2.11 GB, layers.2.attention_norm.weight : 18%|████████████ | 26/148 [00:00<00:03, 34.79it/s] ram used: 2.11 GB, layers.2.ffn_norm.weight : 18%|████████████▌ | 27/148 [00:00<00:03, 35.54it/s] ram used: 2.11 GB, layers.3.attention.wq.weight : 19%|█████████████ | 28/148 [00:00<00:03, 35.79it/s] ram used: 2.12 GB, layers.3.attention.wk.weight : 20%|█████████████▌ | 29/148 [00:00<00:03, 36.32it/s] ram used: 2.12 GB, layers.3.attention.wv.weight : 20%|█████████████▉ | 30/148 [00:00<00:03, 36.85it/s] ram used: 2.12 GB, layers.3.attention.wo.weight : 21%|██████████████▍ | 31/148 [00:00<00:03, 37.05it/s] ram used: 2.13 GB, layers.3.feed_forward.w1.weight : 22%|██████████████▉ | 32/148 [00:00<00:03, 36.10it/s] ram used: 2.16 GB, layers.3.feed_forward.w2.weight : 22%|███████████████▍ | 33/148 [00:00<00:03, 35.26it/s] ram used: 2.19 GB, layers.3.feed_forward.w3.weight : 23%|███████████████▊ | 34/148 [00:00<00:03, 34.51it/s] ram used: 2.23 GB, layers.3.attention_norm.weight : 24%|████████████████▎ | 35/148 [00:00<00:03, 35.08it/s] ram used: 2.23 GB, layers.3.ffn_norm.weight : 24%|████████████████▊ | 36/148 [00:01<00:03, 35.67it/s] ram used: 2.23 GB, layers.4.attention.wq.weight : 25%|█████████████████▎ | 37/148 [00:01<00:03, 35.86it/s] ram used: 2.24 GB, layers.4.attention.wk.weight : 26%|█████████████████▋ | 38/148 [00:01<00:03, 36.26it/s] ram used: 2.24 GB, layers.4.attention.wv.weight : 26%|██████████████████▏ | 39/148 [00:01<00:02, 36.67it/s] ram used: 2.24 GB, layers.4.attention.wo.weight : 27%|██████████████████▋ | 40/148 [00:01<00:02, 36.83it/s] ram used: 2.25 GB, layers.4.feed_forward.w1.weight : 28%|███████████████████ | 41/148 [00:01<00:02, 36.10it/s] ram used: 2.28 GB, layers.4.feed_forward.w2.weight : 28%|███████████████████▌ | 42/148 [00:01<00:02, 35.43it/s] Download error on attempt 5/30 for repo_id='TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R' revision='main' path='model.safetensors.index.json' target_dir=PosixPath('/tmp/exo/TriAiExperiments--SFR-Iterative-DPO-LLaMA-3-70B-R') Traceback (most recent call last): ram used: 2.32 GB, layers.4.feed_forward.w3.weight : 29%|████████████████████ | 43/148 [00:01<00:03, 34.69it/s] ram used: 2.32 GB, layers.4.feed_forward.w3.weight : 29%|████████████████████ | 43/148 [00:01<00:03, 34.69it/s] File "/home/jetnx/Desktop/llm/exo/exo/download/new_shard_download.py", line 134, in download_file_with_retry try: return await _download_file(repo_id, revision, path, target_dir, on_progress) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ram used: 2.35 GB, layers.4.attention_norm.weight : 30%|████████████████████▌ | 44/148 [00:01<00:03, 34.65it/s] ram used: 2.35 GB, layers.4.attention_norm.weight : 30%|████████████████████▌ | 44/148 [00:01<00:03, 34.65it/s] File "/home/jetnx/Desktop/llm/exo/exo/download/new_shard_download.py", line 156, in _download_file assert r.status in [200, 206], f"Failed to download {path} from {url}: {r.status}" ^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Failed to download model.safetensors.index.json from https://hf-mirror.com/TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R/resolve/main/model.safetensors.index.json: 401 ram used: 2.35 GB, layers.4.ffn_norm.weight : 30%|████████████████████▉ | 45/148 [00:01<00:02, 34.50it/s] ram used: 2.35 GB, layers.5.attention.wq.weight : 31%|█████████████████████▍ | 46/148 [00:01<00:02, 34.28it/s] ram used: 2.36 GB, layers.5.attention.wk.weight : 32%|█████████████████████▉ | 47/148 [00:01<00:02, 34.56it/s] ram used: 2.36 GB, layers.5.attention.wv.weight : 32%|██████████████████████▍ | 48/148 [00:01<00:02, 34.88it/s] ram used: 2.36 GB, layers.5.attention.wo.weight : 33%|██████████████████████▊ | 49/148 [00:01<00:02, 35.03it/s] ram used: 2.37 GB, layers.5.feed_forward.w1.weight : 34%|███████████████████████▎ | 50/148 [00:01<00:02, 34.50it/s] ram used: 2.40 GB, layers.5.feed_forward.w2.weight : 34%|███████████████████████▊ | 51/148 [00:01<00:02, 34.01it/s] ram used: 2.44 GB, layers.5.feed_forward.w3.weight : 35%|████████████████████████▏ | 52/148 [00:01<00:02, 33.56it/s] ram used: 2.47 GB, layers.5.attention_norm.weight : 36%|████████████████████████▋ | 53/148 [00:01<00:02, 33.93it/s] ram used: 2.47 GB, layers.5.ffn_norm.weight : 36%|█████████████████████████▏ | 54/148 [00:01<00:02, 34.32it/s] ram used: 2.47 GB, layers.6.attention.wq.weight : 37%|█████████████████████████▋ | 55/148 [00:01<00:02, 34.46it/s] ram used: 2.48 GB, layers.6.attention.wk.weight : 38%|██████████████████████████ | 56/148 [00:01<00:02, 34.74it/s] ram used: 2.48 GB, layers.6.attention.wv.weight : 39%|██████████████████████████▌ | 57/148 [00:01<00:02, 35.03it/s] ram used: 2.48 GB, layers.6.attention.wo.weight : 39%|███████████████████████████ | 58/148 [00:01<00:02, 35.16it/s] ram used: 2.49 GB, layers.6.feed_forward.w1.weight : 40%|███████████████████████████▌ | 59/148 [00:01<00:02, 34.73it/s] ram used: 2.53 GB, layers.6.feed_forward.w2.weight : 41%|███████████████████████████▉ | 60/148 [00:01<00:02, 34.32it/s] ram used: 2.56 GB, layers.6.feed_forward.w3.weight : 41%|████████████████████████████▍ | 61/148 [00:01<00:02, 33.94it/s] ram used: 2.59 GB, layers.6.attention_norm.weight : 42%|████████████████████████████▉ | 62/148 [00:01<00:02, 34.26it/s] ram used: 2.59 GB, layers.6.ffn_norm.weight : 43%|█████████████████████████████▎ | 63/148 [00:01<00:02, 34.60it/s] ram used: 2.59 GB, layers.7.attention.wq.weight : 43%|█████████████████████████████▊ | 64/148 [00:01<00:02, 34.72it/s] ram used: 2.60 GB, layers.7.attention.wk.weight : 44%|██████████████████████████████▎ | 65/148 [00:01<00:02, 34.95it/s] ram used: 2.60 GB, layers.7.attention.wv.weight : 45%|██████████████████████████████▊ | 66/148 [00:01<00:02, 35.20it/s] ram used: 2.61 GB, layers.7.attention.wo.weight : 45%|███████████████████████████████▏ | 67/148 [00:01<00:02, 35.31it/s] ram used: 2.61 GB, layers.7.feed_forward.w1.weight : 46%|███████████████████████████████▋ | 68/148 [00:01<00:02, 34.83it/s] ram used: 2.65 GB, layers.7.feed_forward.w2.weight : 47%|████████████████████████████████▏ | 69/148 [00:02<00:02, 34.47it/s] ram used: 2.68 GB, layers.7.feed_forward.w3.weight : 47%|████████████████████████████████▋ | 70/148 [00:02<00:02, 34.13it/s] ram used: 2.71 GB, layers.7.attention_norm.weight : 48%|█████████████████████████████████ | 71/148 [00:02<00:02, 34.41it/s] ram used: 2.71 GB, layers.7.ffn_norm.weight : 49%|█████████████████████████████████▌ | 72/148 [00:02<00:02, 34.70it/s] ram used: 2.71 GB, layers.8.attention.wq.weight : 49%|██████████████████████████████████ | 73/148 [00:02<00:02, 34.80it/s] ram used: 2.72 GB, layers.8.attention.wk.weight : 50%|██████████████████████████████████▌ | 74/148 [00:02<00:02, 35.01it/s] ram used: 2.73 GB, layers.8.attention.wv.weight : 51%|██████████████████████████████████▉ | 75/148 [00:02<00:02, 35.23it/s] ram used: 2.73 GB, layers.8.attention.wo.weight : 51%|███████████████████████████████████▍ | 76/148 [00:02<00:02, 35.33it/s] ram used: 2.74 GB, layers.8.feed_forward.w1.weight : 52%|███████████████████████████████████▉ | 77/148 [00:02<00:02, 34.99it/s] ram used: 2.77 GB, layers.8.feed_forward.w2.weight : 53%|████████████████████████████████████▎ | 78/148 [00:02<00:02, 34.67it/s] ram used: 2.80 GB, layers.8.feed_forward.w3.weight : 53%|████████████████████████████████████▊ | 79/148 [00:02<00:02, 34.36it/s] ram used: 2.84 GB, layers.8.attention_norm.weight : 54%|█████████████████████████████████████▎ | 80/148 [00:02<00:01, 34.61it/s] ram used: 2.84 GB, layers.8.ffn_norm.weight : 55%|█████████████████████████████████████▊ | 81/148 [00:02<00:01, 34.87it/s] ram used: 2.84 GB, layers.9.attention.wq.weight : 55%|██████████████████████████████████████▏ | 82/148 [00:02<00:01, 34.96it/s] ram used: 2.84 GB, layers.9.attention.wk.weight : 56%|██████████████████████████████████████▋ | 83/148 [00:02<00:01, 35.15it/s] ram used: 2.85 GB, layers.9.attention.wv.weight : 57%|███████████████████████████████████████▏ | 84/148 [00:02<00:01, 35.35it/s] ram used: 2.85 GB, layers.9.attention.wo.weight : 57%|███████████████████████████████████████▋ | 85/148 [00:02<00:01, 35.43it/s] ram used: 2.86 GB, layers.9.feed_forward.w1.weight : 58%|████████████████████████████████████████ | 86/148 [00:02<00:01, 35.12it/s] ram used: 2.89 GB, layers.9.feed_forward.w2.weight : 59%|████████████████████████████████████████▌ | 87/148 [00:02<00:01, 34.81it/s] ram used: 2.92 GB, layers.9.feed_forward.w3.weight : 59%|█████████████████████████████████████████ | 88/148 [00:02<00:01, 34.52it/s] ram used: 2.96 GB, layers.9.attention_norm.weight : 60%|█████████████████████████████████████████▍ | 89/148 [00:02<00:01, 34.74it/s] ram used: 2.96 GB, layers.9.ffn_norm.weight : 61%|█████████████████████████████████████████▉ | 90/148 [00:02<00:01, 34.97it/s] ram used: 2.96 GB, layers.10.attention.wq.weight : 61%|██████████████████████████████████████████▍ | 91/148 [00:02<00:01, 35.25it/s] ram used: 2.96 GB, layers.10.attention.wk.weight : 62%|██████████████████████████████████████████▉ | 92/148 [00:02<00:01, 35.53it/s] ram used: 2.96 GB, layers.10.attention.wv.weight : 63%|███████████████████████████████████████████▎ | 93/148 [00:02<00:01, 35.80it/s] ram used: 2.96 GB, layers.10.attention.wo.weight : 64%|███████████████████████████████████████████▊ | 94/148 [00:02<00:01, 36.08it/s] ram used: 2.96 GB, layers.10.feed_forward.w1.weight : 64%|████████████████████████████████████████████▎ | 95/148 [00:02<00:01, 36.35it/s] ram used: 2.96 GB, layers.10.feed_forward.w2.weight : 65%|████████████████████████████████████████████▊ | 96/148 [00:02<00:01, 36.62it/s] ram used: 2.96 GB, layers.10.feed_forward.w3.weight : 66%|█████████████████████████████████████████████▏ | 97/148 [00:02<00:01, 36.89it/s] ram used: 2.96 GB, layers.10.attention_norm.weight : 66%|█████████████████████████████████████████████▋ | 98/148 [00:02<00:01, 37.16it/s] ram used: 2.96 GB, layers.10.ffn_norm.weight : 67%|██████████████████████████████████████████████▏ | 99/148 [00:02<00:01, 37.42it/s] ram used: 2.96 GB, layers.11.attention.wq.weight : 68%|█████████████████████████████████████████████▉ | 100/148 [00:02<00:01, 37.69it/s] ram used: 2.96 GB, layers.11.attention.wk.weight : 68%|██████████████████████████████████████████████▍ | 101/148 [00:02<00:01, 37.95it/s] ram used: 2.96 GB, layers.11.attention.wv.weight : 69%|██████████████████████████████████████████████▊ | 102/148 [00:02<00:01, 38.21it/s] ram used: 2.96 GB, layers.11.attention.wo.weight : 70%|███████████████████████████████████████████████▎ | 103/148 [00:02<00:01, 38.47it/s] ram used: 2.96 GB, layers.11.feed_forward.w1.weight : 70%|███████████████████████████████████████████████▊ | 104/148 [00:02<00:01, 38.73it/s] ram used: 2.96 GB, layers.11.feed_forward.w2.weight : 71%|████████████████████████████████████████████████▏ | 105/148 [00:02<00:01, 38.99it/s] ram used: 2.96 GB, layers.11.feed_forward.w3.weight : 72%|████████████████████████████████████████████████▋ | 106/148 [00:02<00:01, 39.24it/s] ram used: 2.96 GB, layers.11.attention_norm.weight : 72%|█████████████████████████████████████████████████▏ | 107/148 [00:02<00:01, 39.50it/s] ram used: 2.96 GB, layers.11.ffn_norm.weight : 73%|█████████████████████████████████████████████████▌ | 108/148 [00:02<00:01, 39.75it/s] ram used: 2.96 GB, layers.12.attention.wq.weight : 74%|██████████████████████████████████████████████████ | 109/148 [00:02<00:00, 40.00it/s] ram used: 2.96 GB, layers.12.attention.wk.weight : 74%|██████████████████████████████████████████████████▌ | 110/148 [00:02<00:00, 40.25it/s] ram used: 2.96 GB, layers.12.attention.wv.weight : 75%|███████████████████████████████████████████████████ | 111/148 [00:02<00:00, 40.50it/s] ram used: 2.96 GB, layers.12.attention.wo.weight : 76%|███████████████████████████████████████████████████▍ | 112/148 [00:02<00:00, 40.75it/s] ram used: 2.96 GB, layers.12.feed_forward.w1.weight : 76%|███████████████████████████████████████████████████▉ | 113/148 [00:02<00:00, 40.99it/s] ram used: 2.96 GB, layers.12.feed_forward.w2.weight : 77%|████████████████████████████████████████████████████▍ | 114/148 [00:02<00:00, 41.24it/s] ram used: 2.96 GB, layers.12.feed_forward.w3.weight : 78%|████████████████████████████████████████████████████▊ | 115/148 [00:02<00:00, 41.48it/s] ram used: 2.96 GB, layers.12.attention_norm.weight : 78%|█████████████████████████████████████████████████████▎ | 116/148 [00:02<00:00, 41.72it/s] ram used: 2.96 GB, layers.12.ffn_norm.weight : 79%|█████████████████████████████████████████████████████▊ | 117/148 [00:02<00:00, 41.96it/s] ram used: 2.96 GB, layers.13.attention.wq.weight : 80%|██████████████████████████████████████████████████████▏ | 118/148 [00:02<00:00, 42.20it/s] ram used: 2.96 GB, layers.13.attention.wk.weight : 80%|██████████████████████████████████████████████████████▋ | 119/148 [00:02<00:00, 42.44it/s] ram used: 2.96 GB, layers.13.attention.wv.weight : 81%|███████████████████████████████████████████████████████▏ | 120/148 [00:02<00:00, 42.68it/s] ram used: 2.96 GB, layers.13.attention.wo.weight : 82%|███████████████████████████████████████████████████████▌ | 121/148 [00:02<00:00, 42.91it/s] ram used: 2.96 GB, layers.13.feed_forward.w1.weight : 82%|████████████████████████████████████████████████████████ | 122/148 [00:02<00:00, 43.15it/s] ram used: 2.96 GB, layers.13.feed_forward.w2.weight : 83%|████████████████████████████████████████████████████████▌ | 123/148 [00:02<00:00, 43.38it/s] ram used: 2.96 GB, layers.13.feed_forward.w3.weight : 84%|████████████████████████████████████████████████████████▉ | 124/148 [00:02<00:00, 43.61it/s] ram used: 2.96 GB, layers.13.attention_norm.weight : 84%|█████████████████████████████████████████████████████████▍ | 125/148 [00:02<00:00, 43.84it/s] ram used: 2.96 GB, layers.13.ffn_norm.weight : 85%|█████████████████████████████████████████████████████████▉ | 126/148 [00:02<00:00, 44.07it/s] ram used: 2.96 GB, layers.14.attention.wq.weight : 86%|██████████████████████████████████████████████████████████▎ | 127/148 [00:02<00:00, 44.30it/s] ram used: 2.96 GB, layers.14.attention.wk.weight : 86%|██████████████████████████████████████████████████████████▊ | 128/148 [00:02<00:00, 44.52it/s] ram used: 2.96 GB, layers.14.attention.wv.weight : 87%|███████████████████████████████████████████████████████████▎ | 129/148 [00:02<00:00, 44.75it/s] ram used: 2.96 GB, layers.14.attention.wo.weight : 88%|███████████████████████████████████████████████████████████▋ | 130/148 [00:02<00:00, 44.97it/s] ram used: 2.96 GB, layers.14.feed_forward.w1.weight : 89%|████████████████████████████████████████████████████████████▏ | 131/148 [00:02<00:00, 45.19it/s] ram used: 2.96 GB, layers.14.feed_forward.w2.weight : 89%|████████████████████████████████████████████████████████████▋ | 132/148 [00:02<00:00, 45.41it/s] ram used: 2.96 GB, layers.14.feed_forward.w3.weight : 90%|█████████████████████████████████████████████████████████████ | 133/148 [00:02<00:00, 45.63it/s] ram used: 2.96 GB, layers.14.attention_norm.weight : 91%|█████████████████████████████████████████████████████████████▌ | 134/148 [00:02<00:00, 45.85it/s] ram used: 2.96 GB, layers.14.ffn_norm.weight : 91%|██████████████████████████████████████████████████████████████ | 135/148 [00:02<00:00, 46.07it/s] ram used: 2.96 GB, layers.15.attention.wq.weight : 92%|██████████████████████████████████████████████████████████████▍ | 136/148 [00:02<00:00, 46.29it/s] ram used: 2.96 GB, layers.15.attention.wk.weight : 93%|██████████████████████████████████████████████████████████████▉ | 137/148 [00:02<00:00, 46.50it/s] ram used: 2.96 GB, layers.15.attention.wv.weight : 93%|███████████████████████████████████████████████████████████████▍ | 138/148 [00:02<00:00, 46.72it/s] ram used: 2.96 GB, layers.15.attention.wo.weight : 94%|███████████████████████████████████████████████████████████████▊ | 139/148 [00:02<00:00, 46.93it/s] ram used: 2.96 GB, layers.15.feed_forward.w1.weight : 95%|████████████████████████████████████████████████████████████████▎ | 140/148 [00:02<00:00, 47.14it/s] ram used: 2.96 GB, layers.15.feed_forward.w2.weight : 95%|████████████████████████████████████████████████████████████████▊ | 141/148 [00:02<00:00, 47.32it/s] ram used: 2.96 GB, layers.15.feed_forward.w3.weight : 96%|█████████████████████████████████████████████████████████████████▏ | 142/148 [00:02<00:00, 47.53it/s] ram used: 2.96 GB, layers.15.attention_norm.weight : 97%|█████████████████████████████████████████████████████████████████▋ | 143/148 [00:02<00:00, 47.74it/s] ram used: 2.96 GB, layers.15.ffn_norm.weight : 97%|██████████████████████████████████████████████████████████████████▏ | 144/148 [00:03<00:00, 47.95it/s] ram used: 2.96 GB, norm.weight : 98%|██████████████████████████████████████████████████████████████████▌ | 145/148 [00:03<00:00, 48.04it/s] ram used: 2.96 GB, tok_embeddings.weight : 99%|███████████████████████████████████████████████████████████████████ | 146/148 [00:03<00:00, 40.77it/s] ram used: 3.48 GB, output.weight : 99%|███████████████████████████████████████████████████████████████████▌| 147/148 [00:03<00:00, 40.95it/s] ram used: 3.48 GB, freqs_cis : 100%|████████████████████████████████████████████████████████████████████| 148/148 [00:03<00:00, 41.14it/s] ram used: 3.48 GB, freqs_cis : 100%|████████████████████████████████████████████████████████████████████| 148/148 [00:03<00:00, 41.05it/s] loaded weights in 3614.36 ms, 1.74 GB loaded at 0.48 GB/s

Mr-lwd avatar Mar 11 '25 13:03 Mr-lwd

Sorry to bother you, but I'd love to know the solution.

Mr-lwd avatar Mar 11 '25 13:03 Mr-lwd

Image

I can successfully run exo on my devices (Jetson Orin Nano 8GB + Jetson Orin Nx 16GB),however the inference speed is too slow (1~2 tokens per second for llama3.2:1b-8bit). But when I watch the "jtop", it seems that GPU is working at a high frequency.

In Ollama, even deepseek-r1(7b, 4.9GB, 4bit) can inference at the high speed(7tps in Jetson Orin Nano 8GB).

I'm not sure that Jetson devices can't run any 8-bit llm at high speed.

Mr-lwd avatar Mar 12 '25 08:03 Mr-lwd

I maybe figure out the reason why the speed is too slow. Jetson devices just support python3.10 and torch2.5~2.6, though they can create venv in python3.12, but the cuda is not available (torch.cuda.is_available() returns false).

Mr-lwd avatar Mar 12 '25 14:03 Mr-lwd

HI, I had install python 3.12 in my Jetson orin nx 16G, and install exo OK,but could get 0(zero) TFLOPS; another is mac mini m4 32G may had 8.9 TFLOPS; when i run exo at jetson and mac min, they can discover each other, but total had 8.9 TFLOPS; i had download deepseek r1-32b by ollama in my mac mini , but when i run with "curl --" not get any responsed, so , could you help me to run exo with the ollama deepseek r1-32 b?

dongpan90 avatar Mar 17 '25 08:03 dongpan90

HI, I had install python 3.12 in my Jetson orin nx 16G, and install exo OK,but could get 0(zero) TFLOPS; another is mac mini m4 32G may had 8.9 TFLOPS; when i run exo at jetson and mac min, they can discover each other, but total had 8.9 TFLOPS; i had download deepseek r1-32b by ollama in my mac mini , but when i run with "curl --" not get any responsed, so , could you help me to run exo with the ollama deepseek r1-32 b?

Hello, it seems that the newest LLMs can only run on "MLX", but not in "tinygrad" for devices with cuda, so you can not run deepseek in jetson. Moreover, to solve "0 TFLOPS" problem, you must modify the configuration file /exo/topology/device_capabilities.py, CHIP_FLOPS = { "Jetson_Orin_Nano": DeviceFlops(fp32=1.28*TFLOPS, fp16=2.56*TFLOPS, int8=40*TFLOPS), "Jetson_Orin_NX": DeviceFlops(fp32=1.88*TFLOPS, fp16=3.76*TFLOPS, int8=100*TFLOPS),

and change the memory capabilities function in device_capabilities.py: ` async def linux_device_capabilities() -> DeviceCapabilities: import psutil from tinygrad import Device

if DEBUG >= 2: print(f"tinygrad {Device.DEFAULT=}") if Device.DEFAULT == "CUDA" or Device.DEFAULT == "NV" or Device.DEFAULT == "GPU": try: # print("Device.DEFAULT", Device.DEFAULT) import pynvml pynvml.nvmlInit() handle = pynvml.nvmlDeviceGetHandleByIndex(0) gpu_raw_name = pynvml.nvmlDeviceGetName(handle).upper() gpu_name = gpu_raw_name.rsplit(" ", 1)[0] if gpu_raw_name.endswith("GB") else gpu_raw_name gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle) pynvml.nvmlShutdown() except Exception as e: if DEBUG >= 2: print(f"pynvml failed: {e}") try: with open("/proc/device-tree/compatible") as f: compatible = f.read().lower() if "tegra194" in compatible: gpu_name = "XAVIER" elif "tegra210" in compatible: gpu_name = "TX1" elif "tegra186" in compatible: gpu_name = "TX2" elif "p3768-0000+p3767-0003" in compatible: gpu_name = "Jetson_Orin_Nano"
elif "p3768-0000+p3767-0000" in compatible: gpu_name = "Jetson_Orin_NX"
else: gpu_name = "JETSON_GPU"

        with open("/proc/meminfo") as f:
            for line in f:
                if "MemTotal" in line:
                    total_mem = int(line.split()[1]) * 1024
                    break
            else:
                total_mem = 0
        
        gpu_memory_info = type('',(object,),{"total": total_mem})()

`

By the way, I do not know the accurate TFLOPS of Jetson devices, and it seems that the TFLOPS configuration does not have any impact, but memory configuration is important because it is related to model sharding.

Mr-lwd avatar Mar 18 '25 07:03 Mr-lwd