Jetson devices reload the model and the inference speed is too slow (llama3.2-1b, 8bit)
ram used: 0.14 GB, layers.1.feed_forward.w1.weight : 9%|âââââââ | 14/148 [00:00<00:03, 33.65it/s] ram used: 0.18 GB, layers.1.feed_forward.w2.weight : 10%|âââââââ | 15/148 [00:00<00:03, 33.42it/s] ram used: 0.21 GB, layers.1.feed_forward.w3.weight : 11%|ââââââââ | 16/148 [00:00<00:03, 33.22it/s] ram used: 0.24 GB, layers.1.attention_norm.weight : 11%|ââââââââ | 17/148 [00:00<00:03, 34.33it/s] ram used: 0.24 GB, layers.1.ffn_norm.weight : 12%|âââââââââ | 18/148 [00:00<00:03, 35.36it/s] ram used: 0.24 GB, layers.2.attention.wq.weight : 13%|âââââââââ | 19/148 [00:00<00:03, 35.89it/s] ram used: 0.25 GB, layers.2.attention.wk.weight : 14%|ââââââââââ | 20/148 [00:00<00:03, 36.66it/s] ram used: 0.25 GB, layers.2.attention.wv.weight : 14%|ââââââââââ | 21/148 [00:00<00:03, 37.49it/s] ram used: 0.26 GB, layers.2.attention.wo.weight : 15%|âââââââââââ | 22/148 [00:00<00:03, 38.04it/s] ram used: 0.26 GB, layers.2.feed_forward.w1.weight : 16%|âââââââââââ | 23/148 [00:00<00:03, 37.67it/s] ram used: 0.30 GB, layers.2.feed_forward.w2.weight : 16%|ââââââââââââ | 24/148 [00:00<00:03, 37.34it/s] ram used: 0.33 GB, layers.2.feed_forward.w3.weight : 17%|ââââââââââââ | 25/148 [00:00<00:03, 37.04it/s] ram used: 0.36 GB, layers.2.attention_norm.weight : 18%|ââââââââââââ | 26/148 [00:00<00:03, 37.84it/s] ram used: 0.36 GB, layers.2.ffn_norm.weight : 18%|âââââââââââââ | 27/148 [00:00<00:03, 38.66it/s] ram used: 0.36 GB, layers.3.attention.wq.weight : 19%|âââââââââââââ | 28/148 [00:00<00:03, 39.05it/s] ram used: 0.37 GB, layers.3.attention.wk.weight : 20%|ââââââââââââââ | 29/148 [00:00<00:03, 39.63it/s] ram used: 0.38 GB, layers.3.attention.wv.weight : 20%|ââââââââââââââ | 30/148 [00:00<00:02, 40.22it/s] ram used: 0.38 GB, layers.3.attention.wo.weight : 21%|âââââââââââââââ | 31/148 [00:00<00:02, 40.60it/s] ram used: 0.39 GB, layers.3.feed_forward.w1.weight : 22%|âââââââââââââââ | 32/148 [00:00<00:02, 40.21it/s] ram used: 0.42 GB, layers.3.feed_forward.w2.weight : 22%|ââââââââââââââââ | 33/148 [00:00<00:02, 39.87it/s] ram used: 0.45 GB, layers.3.feed_forward.w3.weight : 23%|ââââââââââââââââ | 34/148 [00:00<00:02, 39.55it/s] ram used: 0.49 GB, layers.3.attention_norm.weight : 24%|âââââââââââââââââ | 35/148 [00:00<00:02, 40.15it/s] ram used: 0.49 GB, layers.3.ffn_norm.weight : 24%|âââââââââââââââââ | 36/148 [00:00<00:02, 40.76it/s] ram used: 0.49 GB, layers.4.attention.wq.weight : 25%|ââââââââââââââââââ | 37/148 [00:00<00:02, 41.06it/s] ram used: 0.49 GB, layers.4.attention.wk.weight : 26%|ââââââââââââââââââ | 38/148 [00:00<00:02, 41.50it/s] ram used: 0.50 GB, layers.4.attention.wv.weight : 26%|âââââââââââââââââââ | 39/148 [00:00<00:02, 41.95it/s] ram used: 0.50 GB, layers.4.attention.wo.weight : 27%|âââââââââââââââââââ | 40/148 [00:00<00:02, 42.22it/s] ram used: 0.51 GB, layers.4.feed_forward.w1.weight : 28%|âââââââââââââââââââ | 41/148 [00:00<00:02, 41.82it/s] ram used: 0.54 GB, layers.4.feed_forward.w2.weight : 28%|ââââââââââââââââââââ | 42/148 [00:01<00:02, 41.45it/s] ram used: 0.57 GB, layers.4.feed_forward.w3.weight : 29%|ââââââââââââââââââââ | 43/148 [00:01<00:02, 41.11it/s] ram used: 0.61 GB, layers.4.attention_norm.weight : 30%|âââââââââââââââââââââ | 44/148 [00:01<00:02, 41.58it/s] ram used: 0.61 GB, layers.4.ffn_norm.weight : 30%|âââââââââââââââââââââ | 45/148 [00:01<00:02, 42.06it/s] ram used: 0.61 GB, layers.5.attention.wq.weight : 31%|ââââââââââââââââââââââ | 46/148 [00:01<00:02, 42.28it/s] ram used: 0.62 GB, layers.5.attention.wk.weight : 32%|ââââââââââââââââââââââ | 47/148 [00:01<00:02, 42.62it/s] ram used: 0.62 GB, layers.5.attention.wv.weight : 32%|âââââââââââââââââââââââ | 48/148 [00:01<00:02, 42.98it/s] ram used: 0.62 GB, layers.5.attention.wo.weight : 33%|âââââââââââââââââââââââ | 49/148 [00:01<00:02, 43.18it/s] ram used: 0.63 GB, layers.5.feed_forward.w1.weight : 34%|ââââââââââââââââââââââââ | 50/148 [00:01<00:02, 42.84it/s] ram used: 0.66 GB, layers.5.feed_forward.w2.weight : 34%|ââââââââââââââââââââââââ | 51/148 [00:01<00:02, 42.51it/s] ram used: 0.70 GB, layers.5.feed_forward.w3.weight : 35%|âââââââââââââââââââââââââ | 52/148 [00:01<00:02, 42.21it/s] ram used: 0.73 GB, layers.5.attention_norm.weight : 36%|âââââââââââââââââââââââââ | 53/148 [00:01<00:02, 42.60it/s] ram used: 0.73 GB, layers.5.ffn_norm.weight : 36%|ââââââââââââââââââââââââââ | 54/148 [00:01<00:02, 43.00it/s] ram used: 0.73 GB, layers.6.attention.wq.weight : 37%|ââââââââââââââââââââââââââ | 55/148 [00:01<00:02, 43.18it/s] ram used: 0.74 GB, layers.6.attention.wk.weight : 38%|ââââââââââââââââââââââââââ | 56/148 [00:01<00:02, 43.47it/s] ram used: 0.74 GB, layers.6.attention.wv.weight : 39%|âââââââââââââââââââââââââââ | 57/148 [00:01<00:02, 43.77it/s] ram used: 0.74 GB, layers.6.attention.wo.weight : 39%|âââââââââââââââââââââââââââ | 58/148 [00:01<00:02, 43.93it/s] ram used: 0.75 GB, layers.6.feed_forward.w1.weight : 40%|ââââââââââââââââââââââââââââ | 59/148 [00:01<00:02, 43.63it/s] ram used: 0.78 GB, layers.6.feed_forward.w2.weight : 41%|ââââââââââââââââââââââââââââ | 60/148 [00:01<00:02, 43.34it/s] ram used: 0.82 GB, layers.6.feed_forward.w3.weight : 41%|âââââââââââââââââââââââââââââ | 61/148 [00:01<00:02, 43.05it/s] ram used: 0.85 GB, layers.6.attention_norm.weight : 42%|âââââââââââââââââââââââââââââ | 62/148 [00:01<00:01, 43.36it/s] ram used: 0.85 GB, layers.6.ffn_norm.weight : 43%|ââââââââââââââââââââââââââââââ | 63/148 [00:01<00:01, 43.69it/s] ram used: 0.85 GB, layers.7.attention.wq.weight : 43%|ââââââââââââââââââââââââââââââ | 64/148 [00:01<00:01, 43.84it/s] ram used: 0.86 GB, layers.7.attention.wk.weight : 44%|âââââââââââââââââââââââââââââââ | 65/148 [00:01<00:01, 44.09it/s] ram used: 0.86 GB, layers.7.attention.wv.weight : 45%|âââââââââââââââââââââââââââââââ | 66/148 [00:01<00:01, 44.34it/s] ram used: 0.86 GB, layers.7.attention.wo.weight : 45%|ââââââââââââââââââââââââââââââââ | 67/148 [00:01<00:01, 44.47it/s] ram used: 0.87 GB, layers.7.feed_forward.w1.weight : 46%|ââââââââââââââââââââââââââââââââ | 68/148 [00:01<00:01, 44.13it/s] ram used: 0.91 GB, layers.7.feed_forward.w2.weight : 47%|âââââââââââââââââââââââââââââââââ | 69/148 [00:01<00:01, 43.87it/s] ram used: 0.94 GB, layers.7.feed_forward.w3.weight : 47%|âââââââââââââââââââââââââââââââââ | 70/148 [00:01<00:01, 43.53it/s] ram used: 0.97 GB, layers.7.attention_norm.weight : 48%|âââââââââââââââââââââââââââââââââ | 71/148 [00:01<00:01, 43.81it/s] ram used: 0.97 GB, layers.7.ffn_norm.weight : 49%|ââââââââââââââââââââââââââââââââââ | 72/148 [00:01<00:01, 44.12it/s] ram used: 0.97 GB, layers.8.attention.wq.weight : 49%|ââââââââââââââââââââââââââââââââââ | 73/148 [00:01<00:01, 44.12it/s] ram used: 0.98 GB, layers.8.attention.wk.weight : 50%|âââââââââââââââââââââââââââââââââââ | 74/148 [00:01<00:01, 44.29it/s] ram used: 0.98 GB, layers.8.attention.wv.weight : 51%|âââââââââââââââââââââââââââââââââââ | 75/148 [00:01<00:01, 44.48it/s] ram used: 0.99 GB, layers.8.attention.wo.weight : 51%|ââââââââââââââââââââââââââââââââââââ | 76/148 [00:01<00:01, 44.46it/s] ram used: 0.99 GB, layers.8.feed_forward.w1.weight : 52%|ââââââââââââââââââââââââââââââââââââ | 77/148 [00:01<00:01, 43.75it/s] ram used: 1.03 GB, layers.8.feed_forward.w2.weight : 53%|âââââââââââââââââââââââââââââââââââââ | 78/148 [00:01<00:01, 43.11it/s] ram used: 1.06 GB, layers.8.feed_forward.w3.weight : 53%|âââââââââââââââââââââââââââââââââââââ | 79/148 [00:01<00:01, 42.50it/s] ram used: 1.09 GB, layers.8.attention_norm.weight : 54%|ââââââââââââââââââââââââââââââââââââââ | 80/148 [00:01<00:01, 42.76it/s] ram used: 1.09 GB, layers.8.ffn_norm.weight : 55%|ââââââââââââââââââââââââââââââââââââââ | 81/148 [00:01<00:01, 43.02it/s] ram used: 1.09 GB, layers.9.attention.wq.weight : 55%|âââââââââââââââââââââââââââââââââââââââ | 82/148 [00:01<00:01, 43.04it/s] ram used: 1.10 GB, layers.9.attention.wk.weight : 56%|âââââââââââââââââââââââââââââââââââââââ | 83/148 [00:01<00:01, 43.21it/s] ram used: 1.11 GB, layers.9.attention.wv.weight : 57%|ââââââââââââââââââââââââââââââââââââââââ | 84/148 [00:01<00:01, 43.37it/s] ram used: 1.11 GB, layers.9.attention.wo.weight : 57%|ââââââââââââââââââââââââââââââââââââââââ | 85/148 [00:01<00:01, 43.15it/s] ram used: 1.12 GB, layers.9.feed_forward.w1.weight : 58%|ââââââââââââââââââââââââââââââââââââââââ | 86/148 [00:02<00:01, 42.58it/s] ram used: 1.15 GB, layers.9.feed_forward.w2.weight : 59%|âââââââââââââââââââââââââââââââââââââââââ | 87/148 [00:02<00:01, 42.03it/s] ram used: 1.18 GB, layers.9.feed_forward.w3.weight : 59%|âââââââââââââââââââââââââââââââââââââââââ | 88/148 [00:02<00:01, 41.52it/s] ram used: 1.22 GB, layers.9.attention_norm.weight : 60%|ââââââââââââââââââââââââââââââââââââââââââ | 89/148 [00:02<00:01, 41.75it/s] ram used: 1.22 GB, layers.9.ffn_norm.weight : 61%|ââââââââââââââââââââââââââââââââââââââââââ | 90/148 [00:02<00:01, 41.99it/s] ram used: 1.22 GB, layers.10.attention.wq.weight : 61%|âââââââââââââââââââââââââââââââââââââââââââ | 91/148 [00:02<00:01, 42.30it/s] ram used: 1.22 GB, layers.10.attention.wk.weight : 62%|âââââââââââââââââââââââââââââââââââââââââââ | 92/148 [00:02<00:01, 42.60it/s] ram used: 1.22 GB, layers.10.attention.wv.weight : 63%|ââââââââââââââââââââââââââââââââââââââââââââ | 93/148 [00:02<00:01, 42.90it/s] ram used: 1.22 GB, layers.10.attention.wo.weight : 64%|ââââââââââââââââââââââââââââââââââââââââââââ | 94/148 [00:02<00:01, 43.20it/s] ram used: 1.22 GB, layers.10.feed_forward.w1.weight : 64%|âââââââââââââââââââââââââââââââââââââââââââââ | 95/148 [00:02<00:01, 43.50it/s] ram used: 1.22 GB, layers.10.feed_forward.w2.weight : 65%|âââââââââââââââââââââââââââââââââââââââââââââ | 96/148 [00:02<00:01, 43.79it/s] ram used: 1.22 GB, layers.10.feed_forward.w3.weight : 66%|ââââââââââââââââââââââââââââââââââââââââââââââ | 97/148 [00:02<00:01, 44.08it/s] ram used: 1.22 GB, layers.10.attention_norm.weight : 66%|ââââââââââââââââââââââââââââââââââââââââââââââ | 98/148 [00:02<00:01, 44.38it/s] ram used: 1.22 GB, layers.10.ffn_norm.weight : 67%|âââââââââââââââââââââââââââââââââââââââââââââââ | 99/148 [00:02<00:01, 44.67it/s] ram used: 1.22 GB, layers.11.attention.wq.weight : 68%|ââââââââââââââââââââââââââââââââââââââââââââââ | 100/148 [00:02<00:01, 44.95it/s] ram used: 1.22 GB, layers.11.attention.wk.weight : 68%|âââââââââââââââââââââââââââââââââââââââââââââââ | 101/148 [00:02<00:01, 45.24it/s] ram used: 1.22 GB, layers.11.attention.wv.weight : 69%|âââââââââââââââââââââââââââââââââââââââââââââââ | 102/148 [00:02<00:01, 45.52it/s] ram used: 1.22 GB, layers.11.attention.wo.weight : 70%|ââââââââââââââââââââââââââââââââââââââââââââââââ | 103/148 [00:02<00:00, 45.81it/s] ram used: 1.22 GB, layers.11.feed_forward.w1.weight : 70%|ââââââââââââââââââââââââââââââââââââââââââââââââ | 104/148 [00:02<00:00, 46.09it/s] ram used: 1.22 GB, layers.11.feed_forward.w2.weight : 71%|âââââââââââââââââââââââââââââââââââââââââââââââââ | 105/148 [00:02<00:00, 46.37it/s] ram used: 1.22 GB, layers.11.feed_forward.w3.weight : 72%|âââââââââââââââââââââââââââââââââââââââââââââââââ | 106/148 [00:02<00:00, 46.64it/s] ram used: 1.22 GB, layers.11.attention_norm.weight : 72%|ââââââââââââââââââââââââââââââââââââââââââââââââââ | 107/148 [00:02<00:00, 46.92it/s] ram used: 1.22 GB, layers.11.ffn_norm.weight : 73%|ââââââââââââââââââââââââââââââââââââââââââââââââââ | 108/148 [00:02<00:00, 47.19it/s] ram used: 1.22 GB, layers.12.attention.wq.weight : 74%|ââââââââââââââââââââââââââââââââââââââââââââââââââ | 109/148 [00:02<00:00, 47.46it/s] ram used: 1.22 GB, layers.12.attention.wk.weight : 74%|âââââââââââââââââââââââââââââââââââââââââââââââââââ | 110/148 [00:02<00:00, 47.73it/s] ram used: 1.22 GB, layers.12.attention.wv.weight : 75%|âââââââââââââââââââââââââââââââââââââââââââââââââââ | 111/148 [00:02<00:00, 48.00it/s] ram used: 1.22 GB, layers.12.attention.wo.weight : 76%|ââââââââââââââââââââââââââââââââââââââââââââââââââââ | 112/148 [00:02<00:00, 48.27it/s] ram used: 1.22 GB, layers.12.feed_forward.w1.weight : 76%|ââââââââââââââââââââââââââââââââââââââââââââââââââââ | 113/148 [00:02<00:00, 48.53it/s] ram used: 1.22 GB, layers.12.feed_forward.w2.weight : 77%|âââââââââââââââââââââââââââââââââââââââââââââââââââââ | 114/148 [00:02<00:00, 48.79it/s] ram used: 1.22 GB, layers.12.feed_forward.w3.weight : 78%|âââââââââââââââââââââââââââââââââââââââââââââââââââââ | 115/148 [00:02<00:00, 49.05it/s] ram used: 1.22 GB, layers.12.attention_norm.weight : 78%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 116/148 [00:02<00:00, 49.31it/s] ram used: 1.22 GB, layers.12.ffn_norm.weight : 79%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 117/148 [00:02<00:00, 49.57it/s] ram used: 1.22 GB, layers.13.attention.wq.weight : 80%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 118/148 [00:02<00:00, 49.82it/s] ram used: 1.22 GB, layers.13.attention.wk.weight : 80%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 119/148 [00:02<00:00, 50.08it/s] ram used: 1.22 GB, layers.13.attention.wv.weight : 81%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 120/148 [00:02<00:00, 50.33it/s] ram used: 1.22 GB, layers.13.attention.wo.weight : 82%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 121/148 [00:02<00:00, 50.58it/s] ram used: 1.22 GB, layers.13.feed_forward.w1.weight : 82%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 122/148 [00:02<00:00, 50.83it/s] ram used: 1.22 GB, layers.13.feed_forward.w2.weight : 83%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 123/148 [00:02<00:00, 51.07it/s] ram used: 1.22 GB, layers.13.feed_forward.w3.weight : 84%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 124/148 [00:02<00:00, 51.32it/s] ram used: 1.22 GB, layers.13.attention_norm.weight : 84%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 125/148 [00:02<00:00, 51.56it/s] ram used: 1.22 GB, layers.13.ffn_norm.weight : 85%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 126/148 [00:02<00:00, 51.80it/s] ram used: 1.22 GB, layers.14.attention.wq.weight : 86%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 127/148 [00:02<00:00, 52.04it/s] ram used: 1.22 GB, layers.14.attention.wk.weight : 86%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 128/148 [00:02<00:00, 52.28it/s] ram used: 1.22 GB, layers.14.attention.wv.weight : 87%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 129/148 [00:02<00:00, 52.52it/s] ram used: 1.22 GB, layers.14.attention.wo.weight : 88%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 130/148 [00:02<00:00, 52.76it/s] ram used: 1.22 GB, layers.14.feed_forward.w1.weight : 89%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 131/148 [00:02<00:00, 52.99it/s] ram used: 1.22 GB, layers.14.feed_forward.w2.weight : 89%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 132/148 [00:02<00:00, 53.23it/s] ram used: 1.22 GB, layers.14.feed_forward.w3.weight : 90%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 133/148 [00:02<00:00, 53.46it/s] ram used: 1.22 GB, layers.14.attention_norm.weight : 91%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 134/148 [00:02<00:00, 53.69it/s] ram used: 1.22 GB, layers.14.ffn_norm.weight : 91%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 135/148 [00:02<00:00, 53.91it/s] ram used: 1.22 GB, layers.15.attention.wq.weight : 92%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 136/148 [00:02<00:00, 54.14it/s] ram used: 1.22 GB, layers.15.attention.wk.weight : 93%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 137/148 [00:02<00:00, 54.34it/s] ram used: 1.22 GB, layers.15.attention.wv.weight : 93%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 138/148 [00:02<00:00, 54.56it/s] ram used: 1.22 GB, layers.15.attention.wo.weight : 94%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 139/148 [00:02<00:00, 54.76it/s] ram used: 1.22 GB, layers.15.feed_forward.w1.weight : 95%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 140/148 [00:02<00:00, 54.97it/s] ram used: 1.22 GB, layers.15.feed_forward.w2.weight : 95%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 141/148 [00:02<00:00, 55.14it/s] ram used: 1.22 GB, layers.15.feed_forward.w3.weight : 96%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 142/148 [00:02<00:00, 55.36it/s] ram used: 1.22 GB, layers.15.attention_norm.weight : 97%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 143/148 [00:02<00:00, 55.57it/s] ram used: 1.22 GB, layers.15.ffn_norm.weight : 97%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 144/148 [00:02<00:00, 55.78it/s] ram used: 1.22 GB, norm.weight : 98%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 145/148 [00:02<00:00, 55.90it/s] Download error on attempt 4/30 for repo_id='TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R' revision='main' path='model.safetensors.index.json' target_dir=PosixPath('/tmp/exo/TriAiExperiments--SFR-Iterative-DPO-LLaMA-3-70B-R') Traceback (most recent call last): File "/home/jetnx/Desktop/llm/exo/exo/download/new_shard_download.py", line 134, in download_file_with_retry try: return await _download_file(repo_id, revision, path, target_dir, on_progress) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jetnx/Desktop/llm/exo/exo/download/new_shard_download.py", line 156, in _download_file assert r.status in [200, 206], f"Failed to download {path} from {url}: {r.status}" ^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Failed to download model.safetensors.index.json from https://hf-mirror.com/TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R/resolve/main/model.safetensors.index.json: 401 ram used: 1.22 GB, tok_embeddings.weight : 99%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 146/148 [00:03<00:00, 43.26it/s] ram used: 1.74 GB, output.weight : 99%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ| 147/148 [00:03<00:00, 43.43it/s] ram used: 1.74 GB, freqs_cis : 100%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ| 148/148 [00:03<00:00, 43.63it/s] ram used: 1.74 GB, freqs_cis : 100%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ| 148/148 [00:03<00:00, 43.52it/s] loaded weights in 3409.17 ms, 1.74 GB loaded at 0.51 GB/s 0%| | 0/148 [00:00<?, ?it/s] ram used: 1.74 GB, layers.0.attention.wq.weight : 1%|â | 1/148 [00:00<00:03, 42.65it/s] ram used: 1.75 GB, layers.0.attention.wk.weight : 1%|â | 2/148 [00:00<00:02, 49.67it/s] ram used: 1.75 GB, layers.0.attention.wv.weight : 2%|ââ | 3/148 [00:00<00:02, 53.50it/s] ram used: 1.75 GB, layers.0.attention.wo.weight : 3%|ââ | 4/148 [00:00<00:02, 50.39it/s] ram used: 1.76 GB, layers.0.feed_forward.w1.weight : 3%|âââ | 5/148 [00:00<00:03, 38.54it/s] ram used: 1.80 GB, layers.0.feed_forward.w2.weight : 4%|âââ | 6/148 [00:00<00:04, 33.33it/s] ram used: 1.83 GB, layers.0.feed_forward.w3.weight : 5%|ââââ | 7/148 [00:00<00:04, 30.32it/s] ram used: 1.86 GB, layers.0.attention_norm.weight : 5%|ââââ | 8/148 [00:00<00:04, 32.85it/s] ram used: 1.86 GB, layers.0.ffn_norm.weight : 6%|âââââ | 9/148 [00:00<00:03, 35.29it/s] ram used: 1.86 GB, layers.1.attention.wq.weight : 7%|âââââ | 10/148 [00:00<00:03, 35.99it/s] ram used: 1.87 GB, layers.1.attention.wk.weight : 7%|ââââââ | 11/148 [00:00<00:03, 37.40it/s] ram used: 1.87 GB, layers.1.attention.wv.weight : 8%|ââââââ | 12/148 [00:00<00:03, 38.74it/s] ram used: 1.88 GB, layers.1.attention.wo.weight : 9%|ââââââ | 13/148 [00:00<00:03, 39.08it/s] ram used: 1.88 GB, layers.1.feed_forward.w1.weight : 9%|âââââââ | 14/148 [00:00<00:03, 36.56it/s] ram used: 1.92 GB, layers.1.feed_forward.w2.weight : 10%|âââââââ | 15/148 [00:00<00:03, 34.60it/s] ram used: 1.95 GB, layers.1.feed_forward.w3.weight : 11%|ââââââââ | 16/148 [00:00<00:03, 33.05it/s] ram used: 1.99 GB, layers.1.attention_norm.weight : 11%|ââââââââ | 17/148 [00:00<00:03, 34.23it/s] ram used: 1.99 GB, layers.1.ffn_norm.weight : 12%|âââââââââ | 18/148 [00:00<00:03, 35.40it/s] ram used: 1.99 GB, layers.2.attention.wq.weight : 13%|âââââââââ | 19/148 [00:00<00:03, 35.77it/s] ram used: 1.99 GB, layers.2.attention.wk.weight : 14%|ââââââââââ | 20/148 [00:00<00:03, 36.55it/s] ram used: 2.00 GB, layers.2.attention.wv.weight : 14%|ââââââââââ | 21/148 [00:00<00:03, 37.30it/s] ram used: 2.00 GB, layers.2.attention.wo.weight : 15%|âââââââââââ | 22/148 [00:00<00:03, 37.56it/s] ram used: 2.01 GB, layers.2.feed_forward.w1.weight : 16%|âââââââââââ | 23/148 [00:00<00:03, 36.20it/s] ram used: 2.04 GB, layers.2.feed_forward.w2.weight : 16%|ââââââââââââ | 24/148 [00:00<00:03, 35.03it/s] ram used: 2.07 GB, layers.2.feed_forward.w3.weight : 17%|ââââââââââââ | 25/148 [00:00<00:03, 34.02it/s] ram used: 2.11 GB, layers.2.attention_norm.weight : 18%|ââââââââââââ | 26/148 [00:00<00:03, 34.79it/s] ram used: 2.11 GB, layers.2.ffn_norm.weight : 18%|âââââââââââââ | 27/148 [00:00<00:03, 35.54it/s] ram used: 2.11 GB, layers.3.attention.wq.weight : 19%|âââââââââââââ | 28/148 [00:00<00:03, 35.79it/s] ram used: 2.12 GB, layers.3.attention.wk.weight : 20%|ââââââââââââââ | 29/148 [00:00<00:03, 36.32it/s] ram used: 2.12 GB, layers.3.attention.wv.weight : 20%|ââââââââââââââ | 30/148 [00:00<00:03, 36.85it/s] ram used: 2.12 GB, layers.3.attention.wo.weight : 21%|âââââââââââââââ | 31/148 [00:00<00:03, 37.05it/s] ram used: 2.13 GB, layers.3.feed_forward.w1.weight : 22%|âââââââââââââââ | 32/148 [00:00<00:03, 36.10it/s] ram used: 2.16 GB, layers.3.feed_forward.w2.weight : 22%|ââââââââââââââââ | 33/148 [00:00<00:03, 35.26it/s] ram used: 2.19 GB, layers.3.feed_forward.w3.weight : 23%|ââââââââââââââââ | 34/148 [00:00<00:03, 34.51it/s] ram used: 2.23 GB, layers.3.attention_norm.weight : 24%|âââââââââââââââââ | 35/148 [00:00<00:03, 35.08it/s] ram used: 2.23 GB, layers.3.ffn_norm.weight : 24%|âââââââââââââââââ | 36/148 [00:01<00:03, 35.67it/s] ram used: 2.23 GB, layers.4.attention.wq.weight : 25%|ââââââââââââââââââ | 37/148 [00:01<00:03, 35.86it/s] ram used: 2.24 GB, layers.4.attention.wk.weight : 26%|ââââââââââââââââââ | 38/148 [00:01<00:03, 36.26it/s] ram used: 2.24 GB, layers.4.attention.wv.weight : 26%|âââââââââââââââââââ | 39/148 [00:01<00:02, 36.67it/s] ram used: 2.24 GB, layers.4.attention.wo.weight : 27%|âââââââââââââââââââ | 40/148 [00:01<00:02, 36.83it/s] ram used: 2.25 GB, layers.4.feed_forward.w1.weight : 28%|âââââââââââââââââââ | 41/148 [00:01<00:02, 36.10it/s] ram used: 2.28 GB, layers.4.feed_forward.w2.weight : 28%|ââââââââââââââââââââ | 42/148 [00:01<00:02, 35.43it/s] Download error on attempt 5/30 for repo_id='TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R' revision='main' path='model.safetensors.index.json' target_dir=PosixPath('/tmp/exo/TriAiExperiments--SFR-Iterative-DPO-LLaMA-3-70B-R') Traceback (most recent call last): ram used: 2.32 GB, layers.4.feed_forward.w3.weight : 29%|ââââââââââââââââââââ | 43/148 [00:01<00:03, 34.69it/s] ram used: 2.32 GB, layers.4.feed_forward.w3.weight : 29%|ââââââââââââââââââââ | 43/148 [00:01<00:03, 34.69it/s] File "/home/jetnx/Desktop/llm/exo/exo/download/new_shard_download.py", line 134, in download_file_with_retry try: return await _download_file(repo_id, revision, path, target_dir, on_progress) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ram used: 2.35 GB, layers.4.attention_norm.weight : 30%|âââââââââââââââââââââ | 44/148 [00:01<00:03, 34.65it/s] ram used: 2.35 GB, layers.4.attention_norm.weight : 30%|âââââââââââââââââââââ | 44/148 [00:01<00:03, 34.65it/s] File "/home/jetnx/Desktop/llm/exo/exo/download/new_shard_download.py", line 156, in _download_file assert r.status in [200, 206], f"Failed to download {path} from {url}: {r.status}" ^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Failed to download model.safetensors.index.json from https://hf-mirror.com/TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R/resolve/main/model.safetensors.index.json: 401 ram used: 2.35 GB, layers.4.ffn_norm.weight : 30%|âââââââââââââââââââââ | 45/148 [00:01<00:02, 34.50it/s] ram used: 2.35 GB, layers.5.attention.wq.weight : 31%|ââââââââââââââââââââââ | 46/148 [00:01<00:02, 34.28it/s] ram used: 2.36 GB, layers.5.attention.wk.weight : 32%|ââââââââââââââââââââââ | 47/148 [00:01<00:02, 34.56it/s] ram used: 2.36 GB, layers.5.attention.wv.weight : 32%|âââââââââââââââââââââââ | 48/148 [00:01<00:02, 34.88it/s] ram used: 2.36 GB, layers.5.attention.wo.weight : 33%|âââââââââââââââââââââââ | 49/148 [00:01<00:02, 35.03it/s] ram used: 2.37 GB, layers.5.feed_forward.w1.weight : 34%|ââââââââââââââââââââââââ | 50/148 [00:01<00:02, 34.50it/s] ram used: 2.40 GB, layers.5.feed_forward.w2.weight : 34%|ââââââââââââââââââââââââ | 51/148 [00:01<00:02, 34.01it/s] ram used: 2.44 GB, layers.5.feed_forward.w3.weight : 35%|âââââââââââââââââââââââââ | 52/148 [00:01<00:02, 33.56it/s] ram used: 2.47 GB, layers.5.attention_norm.weight : 36%|âââââââââââââââââââââââââ | 53/148 [00:01<00:02, 33.93it/s] ram used: 2.47 GB, layers.5.ffn_norm.weight : 36%|ââââââââââââââââââââââââââ | 54/148 [00:01<00:02, 34.32it/s] ram used: 2.47 GB, layers.6.attention.wq.weight : 37%|ââââââââââââââââââââââââââ | 55/148 [00:01<00:02, 34.46it/s] ram used: 2.48 GB, layers.6.attention.wk.weight : 38%|ââââââââââââââââââââââââââ | 56/148 [00:01<00:02, 34.74it/s] ram used: 2.48 GB, layers.6.attention.wv.weight : 39%|âââââââââââââââââââââââââââ | 57/148 [00:01<00:02, 35.03it/s] ram used: 2.48 GB, layers.6.attention.wo.weight : 39%|âââââââââââââââââââââââââââ | 58/148 [00:01<00:02, 35.16it/s] ram used: 2.49 GB, layers.6.feed_forward.w1.weight : 40%|ââââââââââââââââââââââââââââ | 59/148 [00:01<00:02, 34.73it/s] ram used: 2.53 GB, layers.6.feed_forward.w2.weight : 41%|ââââââââââââââââââââââââââââ | 60/148 [00:01<00:02, 34.32it/s] ram used: 2.56 GB, layers.6.feed_forward.w3.weight : 41%|âââââââââââââââââââââââââââââ | 61/148 [00:01<00:02, 33.94it/s] ram used: 2.59 GB, layers.6.attention_norm.weight : 42%|âââââââââââââââââââââââââââââ | 62/148 [00:01<00:02, 34.26it/s] ram used: 2.59 GB, layers.6.ffn_norm.weight : 43%|ââââââââââââââââââââââââââââââ | 63/148 [00:01<00:02, 34.60it/s] ram used: 2.59 GB, layers.7.attention.wq.weight : 43%|ââââââââââââââââââââââââââââââ | 64/148 [00:01<00:02, 34.72it/s] ram used: 2.60 GB, layers.7.attention.wk.weight : 44%|âââââââââââââââââââââââââââââââ | 65/148 [00:01<00:02, 34.95it/s] ram used: 2.60 GB, layers.7.attention.wv.weight : 45%|âââââââââââââââââââââââââââââââ | 66/148 [00:01<00:02, 35.20it/s] ram used: 2.61 GB, layers.7.attention.wo.weight : 45%|ââââââââââââââââââââââââââââââââ | 67/148 [00:01<00:02, 35.31it/s] ram used: 2.61 GB, layers.7.feed_forward.w1.weight : 46%|ââââââââââââââââââââââââââââââââ | 68/148 [00:01<00:02, 34.83it/s] ram used: 2.65 GB, layers.7.feed_forward.w2.weight : 47%|âââââââââââââââââââââââââââââââââ | 69/148 [00:02<00:02, 34.47it/s] ram used: 2.68 GB, layers.7.feed_forward.w3.weight : 47%|âââââââââââââââââââââââââââââââââ | 70/148 [00:02<00:02, 34.13it/s] ram used: 2.71 GB, layers.7.attention_norm.weight : 48%|âââââââââââââââââââââââââââââââââ | 71/148 [00:02<00:02, 34.41it/s] ram used: 2.71 GB, layers.7.ffn_norm.weight : 49%|ââââââââââââââââââââââââââââââââââ | 72/148 [00:02<00:02, 34.70it/s] ram used: 2.71 GB, layers.8.attention.wq.weight : 49%|ââââââââââââââââââââââââââââââââââ | 73/148 [00:02<00:02, 34.80it/s] ram used: 2.72 GB, layers.8.attention.wk.weight : 50%|âââââââââââââââââââââââââââââââââââ | 74/148 [00:02<00:02, 35.01it/s] ram used: 2.73 GB, layers.8.attention.wv.weight : 51%|âââââââââââââââââââââââââââââââââââ | 75/148 [00:02<00:02, 35.23it/s] ram used: 2.73 GB, layers.8.attention.wo.weight : 51%|ââââââââââââââââââââââââââââââââââââ | 76/148 [00:02<00:02, 35.33it/s] ram used: 2.74 GB, layers.8.feed_forward.w1.weight : 52%|ââââââââââââââââââââââââââââââââââââ | 77/148 [00:02<00:02, 34.99it/s] ram used: 2.77 GB, layers.8.feed_forward.w2.weight : 53%|âââââââââââââââââââââââââââââââââââââ | 78/148 [00:02<00:02, 34.67it/s] ram used: 2.80 GB, layers.8.feed_forward.w3.weight : 53%|âââââââââââââââââââââââââââââââââââââ | 79/148 [00:02<00:02, 34.36it/s] ram used: 2.84 GB, layers.8.attention_norm.weight : 54%|ââââââââââââââââââââââââââââââââââââââ | 80/148 [00:02<00:01, 34.61it/s] ram used: 2.84 GB, layers.8.ffn_norm.weight : 55%|ââââââââââââââââââââââââââââââââââââââ | 81/148 [00:02<00:01, 34.87it/s] ram used: 2.84 GB, layers.9.attention.wq.weight : 55%|âââââââââââââââââââââââââââââââââââââââ | 82/148 [00:02<00:01, 34.96it/s] ram used: 2.84 GB, layers.9.attention.wk.weight : 56%|âââââââââââââââââââââââââââââââââââââââ | 83/148 [00:02<00:01, 35.15it/s] ram used: 2.85 GB, layers.9.attention.wv.weight : 57%|ââââââââââââââââââââââââââââââââââââââââ | 84/148 [00:02<00:01, 35.35it/s] ram used: 2.85 GB, layers.9.attention.wo.weight : 57%|ââââââââââââââââââââââââââââââââââââââââ | 85/148 [00:02<00:01, 35.43it/s] ram used: 2.86 GB, layers.9.feed_forward.w1.weight : 58%|ââââââââââââââââââââââââââââââââââââââââ | 86/148 [00:02<00:01, 35.12it/s] ram used: 2.89 GB, layers.9.feed_forward.w2.weight : 59%|âââââââââââââââââââââââââââââââââââââââââ | 87/148 [00:02<00:01, 34.81it/s] ram used: 2.92 GB, layers.9.feed_forward.w3.weight : 59%|âââââââââââââââââââââââââââââââââââââââââ | 88/148 [00:02<00:01, 34.52it/s] ram used: 2.96 GB, layers.9.attention_norm.weight : 60%|ââââââââââââââââââââââââââââââââââââââââââ | 89/148 [00:02<00:01, 34.74it/s] ram used: 2.96 GB, layers.9.ffn_norm.weight : 61%|ââââââââââââââââââââââââââââââââââââââââââ | 90/148 [00:02<00:01, 34.97it/s] ram used: 2.96 GB, layers.10.attention.wq.weight : 61%|âââââââââââââââââââââââââââââââââââââââââââ | 91/148 [00:02<00:01, 35.25it/s] ram used: 2.96 GB, layers.10.attention.wk.weight : 62%|âââââââââââââââââââââââââââââââââââââââââââ | 92/148 [00:02<00:01, 35.53it/s] ram used: 2.96 GB, layers.10.attention.wv.weight : 63%|ââââââââââââââââââââââââââââââââââââââââââââ | 93/148 [00:02<00:01, 35.80it/s] ram used: 2.96 GB, layers.10.attention.wo.weight : 64%|ââââââââââââââââââââââââââââââââââââââââââââ | 94/148 [00:02<00:01, 36.08it/s] ram used: 2.96 GB, layers.10.feed_forward.w1.weight : 64%|âââââââââââââââââââââââââââââââââââââââââââââ | 95/148 [00:02<00:01, 36.35it/s] ram used: 2.96 GB, layers.10.feed_forward.w2.weight : 65%|âââââââââââââââââââââââââââââââââââââââââââââ | 96/148 [00:02<00:01, 36.62it/s] ram used: 2.96 GB, layers.10.feed_forward.w3.weight : 66%|ââââââââââââââââââââââââââââââââââââââââââââââ | 97/148 [00:02<00:01, 36.89it/s] ram used: 2.96 GB, layers.10.attention_norm.weight : 66%|ââââââââââââââââââââââââââââââââââââââââââââââ | 98/148 [00:02<00:01, 37.16it/s] ram used: 2.96 GB, layers.10.ffn_norm.weight : 67%|âââââââââââââââââââââââââââââââââââââââââââââââ | 99/148 [00:02<00:01, 37.42it/s] ram used: 2.96 GB, layers.11.attention.wq.weight : 68%|ââââââââââââââââââââââââââââââââââââââââââââââ | 100/148 [00:02<00:01, 37.69it/s] ram used: 2.96 GB, layers.11.attention.wk.weight : 68%|âââââââââââââââââââââââââââââââââââââââââââââââ | 101/148 [00:02<00:01, 37.95it/s] ram used: 2.96 GB, layers.11.attention.wv.weight : 69%|âââââââââââââââââââââââââââââââââââââââââââââââ | 102/148 [00:02<00:01, 38.21it/s] ram used: 2.96 GB, layers.11.attention.wo.weight : 70%|ââââââââââââââââââââââââââââââââââââââââââââââââ | 103/148 [00:02<00:01, 38.47it/s] ram used: 2.96 GB, layers.11.feed_forward.w1.weight : 70%|ââââââââââââââââââââââââââââââââââââââââââââââââ | 104/148 [00:02<00:01, 38.73it/s] ram used: 2.96 GB, layers.11.feed_forward.w2.weight : 71%|âââââââââââââââââââââââââââââââââââââââââââââââââ | 105/148 [00:02<00:01, 38.99it/s] ram used: 2.96 GB, layers.11.feed_forward.w3.weight : 72%|âââââââââââââââââââââââââââââââââââââââââââââââââ | 106/148 [00:02<00:01, 39.24it/s] ram used: 2.96 GB, layers.11.attention_norm.weight : 72%|ââââââââââââââââââââââââââââââââââââââââââââââââââ | 107/148 [00:02<00:01, 39.50it/s] ram used: 2.96 GB, layers.11.ffn_norm.weight : 73%|ââââââââââââââââââââââââââââââââââââââââââââââââââ | 108/148 [00:02<00:01, 39.75it/s] ram used: 2.96 GB, layers.12.attention.wq.weight : 74%|ââââââââââââââââââââââââââââââââââââââââââââââââââ | 109/148 [00:02<00:00, 40.00it/s] ram used: 2.96 GB, layers.12.attention.wk.weight : 74%|âââââââââââââââââââââââââââââââââââââââââââââââââââ | 110/148 [00:02<00:00, 40.25it/s] ram used: 2.96 GB, layers.12.attention.wv.weight : 75%|âââââââââââââââââââââââââââââââââââââââââââââââââââ | 111/148 [00:02<00:00, 40.50it/s] ram used: 2.96 GB, layers.12.attention.wo.weight : 76%|ââââââââââââââââââââââââââââââââââââââââââââââââââââ | 112/148 [00:02<00:00, 40.75it/s] ram used: 2.96 GB, layers.12.feed_forward.w1.weight : 76%|ââââââââââââââââââââââââââââââââââââââââââââââââââââ | 113/148 [00:02<00:00, 40.99it/s] ram used: 2.96 GB, layers.12.feed_forward.w2.weight : 77%|âââââââââââââââââââââââââââââââââââââââââââââââââââââ | 114/148 [00:02<00:00, 41.24it/s] ram used: 2.96 GB, layers.12.feed_forward.w3.weight : 78%|âââââââââââââââââââââââââââââââââââââââââââââââââââââ | 115/148 [00:02<00:00, 41.48it/s] ram used: 2.96 GB, layers.12.attention_norm.weight : 78%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 116/148 [00:02<00:00, 41.72it/s] ram used: 2.96 GB, layers.12.ffn_norm.weight : 79%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 117/148 [00:02<00:00, 41.96it/s] ram used: 2.96 GB, layers.13.attention.wq.weight : 80%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 118/148 [00:02<00:00, 42.20it/s] ram used: 2.96 GB, layers.13.attention.wk.weight : 80%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 119/148 [00:02<00:00, 42.44it/s] ram used: 2.96 GB, layers.13.attention.wv.weight : 81%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 120/148 [00:02<00:00, 42.68it/s] ram used: 2.96 GB, layers.13.attention.wo.weight : 82%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 121/148 [00:02<00:00, 42.91it/s] ram used: 2.96 GB, layers.13.feed_forward.w1.weight : 82%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 122/148 [00:02<00:00, 43.15it/s] ram used: 2.96 GB, layers.13.feed_forward.w2.weight : 83%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 123/148 [00:02<00:00, 43.38it/s] ram used: 2.96 GB, layers.13.feed_forward.w3.weight : 84%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 124/148 [00:02<00:00, 43.61it/s] ram used: 2.96 GB, layers.13.attention_norm.weight : 84%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 125/148 [00:02<00:00, 43.84it/s] ram used: 2.96 GB, layers.13.ffn_norm.weight : 85%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 126/148 [00:02<00:00, 44.07it/s] ram used: 2.96 GB, layers.14.attention.wq.weight : 86%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 127/148 [00:02<00:00, 44.30it/s] ram used: 2.96 GB, layers.14.attention.wk.weight : 86%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 128/148 [00:02<00:00, 44.52it/s] ram used: 2.96 GB, layers.14.attention.wv.weight : 87%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 129/148 [00:02<00:00, 44.75it/s] ram used: 2.96 GB, layers.14.attention.wo.weight : 88%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 130/148 [00:02<00:00, 44.97it/s] ram used: 2.96 GB, layers.14.feed_forward.w1.weight : 89%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 131/148 [00:02<00:00, 45.19it/s] ram used: 2.96 GB, layers.14.feed_forward.w2.weight : 89%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 132/148 [00:02<00:00, 45.41it/s] ram used: 2.96 GB, layers.14.feed_forward.w3.weight : 90%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 133/148 [00:02<00:00, 45.63it/s] ram used: 2.96 GB, layers.14.attention_norm.weight : 91%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 134/148 [00:02<00:00, 45.85it/s] ram used: 2.96 GB, layers.14.ffn_norm.weight : 91%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 135/148 [00:02<00:00, 46.07it/s] ram used: 2.96 GB, layers.15.attention.wq.weight : 92%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 136/148 [00:02<00:00, 46.29it/s] ram used: 2.96 GB, layers.15.attention.wk.weight : 93%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 137/148 [00:02<00:00, 46.50it/s] ram used: 2.96 GB, layers.15.attention.wv.weight : 93%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 138/148 [00:02<00:00, 46.72it/s] ram used: 2.96 GB, layers.15.attention.wo.weight : 94%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 139/148 [00:02<00:00, 46.93it/s] ram used: 2.96 GB, layers.15.feed_forward.w1.weight : 95%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 140/148 [00:02<00:00, 47.14it/s] ram used: 2.96 GB, layers.15.feed_forward.w2.weight : 95%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 141/148 [00:02<00:00, 47.32it/s] ram used: 2.96 GB, layers.15.feed_forward.w3.weight : 96%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 142/148 [00:02<00:00, 47.53it/s] ram used: 2.96 GB, layers.15.attention_norm.weight : 97%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 143/148 [00:02<00:00, 47.74it/s] ram used: 2.96 GB, layers.15.ffn_norm.weight : 97%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 144/148 [00:03<00:00, 47.95it/s] ram used: 2.96 GB, norm.weight : 98%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 145/148 [00:03<00:00, 48.04it/s] ram used: 2.96 GB, tok_embeddings.weight : 99%|âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ | 146/148 [00:03<00:00, 40.77it/s] ram used: 3.48 GB, output.weight : 99%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ| 147/148 [00:03<00:00, 40.95it/s] ram used: 3.48 GB, freqs_cis : 100%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ| 148/148 [00:03<00:00, 41.14it/s] ram used: 3.48 GB, freqs_cis : 100%|ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ| 148/148 [00:03<00:00, 41.05it/s] loaded weights in 3614.36 ms, 1.74 GB loaded at 0.48 GB/s
Sorry to bother you, but I'd love to know the solution.
I can successfully run exo on my devices (Jetson Orin Nano 8GB + Jetson Orin Nx 16GB)ďźhowever the inference speed is too slow (1~2 tokens per second for llama3.2:1b-8bit). But when I watch the "jtop", it seems that GPU is working at a high frequency.
In Ollama, even deepseek-r1(7b, 4.9GB, 4bit) can inference at the high speed(7tps in Jetson Orin Nano 8GB).
I'm not sure that Jetson devices can't run any 8-bit llm at high speed.
I maybe figure out the reason why the speed is too slow. Jetson devices just support python3.10 and torch2.5~2.6, though they can create venv in python3.12, but the cuda is not available (torch.cuda.is_available() returns false).
HI, I had install python 3.12 in my Jetson orin nx 16G, and install exo OKďźbut could get 0(zero) TFLOPS; another is mac mini m4 32G may had 8.9 TFLOPS; when i run exo at jetson and mac min, they can discover each other, but total had 8.9 TFLOPS; i had download deepseek r1-32b by ollama in my mac mini , but when i run with "curl --" not get any responsed, so , could you help me to run exo with the ollama deepseek r1-32 b?
HI, I had install python 3.12 in my Jetson orin nx 16G, and install exo OKďźbut could get 0(zero) TFLOPS; another is mac mini m4 32G may had 8.9 TFLOPS; when i run exo at jetson and mac min, they can discover each other, but total had 8.9 TFLOPS; i had download deepseek r1-32b by ollama in my mac mini , but when i run with "curl --" not get any responsed, so , could you help me to run exo with the ollama deepseek r1-32 b?
Hello, it seems that the newest LLMs can only run on "MLX", but not in "tinygrad" for devices with cuda, so you can not run deepseek in jetson. Moreover, to solve "0 TFLOPS" problem, you must modify the configuration file /exo/topology/device_capabilities.py,
CHIP_FLOPS = { "Jetson_Orin_Nano": DeviceFlops(fp32=1.28*TFLOPS, fp16=2.56*TFLOPS, int8=40*TFLOPS), "Jetson_Orin_NX": DeviceFlops(fp32=1.88*TFLOPS, fp16=3.76*TFLOPS, int8=100*TFLOPS),
and change the memory capabilities function in device_capabilities.py: ` async def linux_device_capabilities() -> DeviceCapabilities: import psutil from tinygrad import Device
if DEBUG >= 2: print(f"tinygrad {Device.DEFAULT=}")
if Device.DEFAULT == "CUDA" or Device.DEFAULT == "NV" or Device.DEFAULT == "GPU":
try:
# print("Device.DEFAULT", Device.DEFAULT)
import pynvml
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
gpu_raw_name = pynvml.nvmlDeviceGetName(handle).upper()
gpu_name = gpu_raw_name.rsplit(" ", 1)[0] if gpu_raw_name.endswith("GB") else gpu_raw_name
gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
pynvml.nvmlShutdown()
except Exception as e:
if DEBUG >= 2: print(f"pynvml failed: {e}")
try:
with open("/proc/device-tree/compatible") as f:
compatible = f.read().lower()
if "tegra194" in compatible:
gpu_name = "XAVIER"
elif "tegra210" in compatible:
gpu_name = "TX1"
elif "tegra186" in compatible:
gpu_name = "TX2"
elif "p3768-0000+p3767-0003" in compatible:
gpu_name = "Jetson_Orin_Nano"
elif "p3768-0000+p3767-0000" in compatible:
gpu_name = "Jetson_Orin_NX"
else:
gpu_name = "JETSON_GPU"
with open("/proc/meminfo") as f:
for line in f:
if "MemTotal" in line:
total_mem = int(line.split()[1]) * 1024
break
else:
total_mem = 0
gpu_memory_info = type('',(object,),{"total": total_mem})()
`
By the way, I do not know the accurate TFLOPS of Jetson devices, and it seems that the TFLOPS configuration does not have any impact, but memory configuration is important because it is related to model sharding.