exo Jetson devices reload the model and the inference speed is too slow (llama3.2-1b, 8bit)

ram used: ram used: 0.18 GB, layers.1.feed_forward.w2.weight ram used: 0.21 GB, layers.1.feed_forward.w3.weight ram used: 0.24 GB, layers.1.attention_norm.weight ram used: 0.24 GB, layers.1.ffn_norm.weight ram used: 0.24 GB, layers.2.attention.wq.weight ram used: 0.25 GB, layers.2.attention.wk.weight ram used: 0.25 GB, layers.2.attention.wv.weight ram used: 0.26 GB, layers.2.attention.wo.weight ram used: 0.26 GB, layers.2.feed_forward.w1.weight ram used: 0.30 GB, layers.2.feed_forward.w2.weight ram used: 0.33 GB, layers.2.feed_forward.w3.weight ram used: 0.36 GB, layers.2.attention_norm.weight ram used: 0.36 GB, layers.2.ffn_norm.weight ram used: 0.36 GB, layers.3.attention.wq.weight ram used: 0.37 GB, layers.3.attention.wk.weight ram used: 0.38 GB, layers.3.attention.wv.weight ram used: 0.38 GB, layers.3.attention.wo.weight ram used: 0.39 GB, layers.3.feed_forward.w1.weight ram used: 0.42 GB, layers.3.feed_forward.w2.weight ram used: 0.45 GB, layers.3.feed_forward.w3.weight ram used: 0.49 GB, layers.3.attention_norm.weight ram used: 0.49 GB, layers.3.ffn_norm.weight ram used: 0.49 GB, layers.4.attention.wq.weight ram used: 0.49 GB, layers.4.attention.wk.weight ram used: 0.50 GB, layers.4.attention.wv.weight ram used: 0.50 GB, layers.4.attention.wo.weight ram used: 0.51 GB, layers.4.feed_forward.w1.weight ram used: 0.54 GB, layers.4.feed_forward.w2.weight ram used: 0.57 GB, layers.4.feed_forward.w3.weight ram used: 0.61 GB, layers.4.attention_norm.weight ram used: 0.61 GB, layers.4.ffn_norm.weight ram used: 0.61 GB, layers.5.attention.wq.weight ram used: 0.62 GB, layers.5.attention.wk.weight ram used: 0.62 GB, layers.5.attention.wv.weight ram used: 0.62 GB, layers.5.attention.wo.weight ram used: 0.63 GB, layers.5.feed_forward.w1.weight ram used: 0.66 GB, layers.5.feed_forward.w2.weight ram used: 0.70 GB, layers.5.feed_forward.w3.weight ram used: 0.73 GB, layers.5.attention_norm.weight ram used: 0.73 GB, layers.5.ffn_norm.weight ram used: 0.73 GB, layers.6.attention.wq.weight ram used: 0.74 GB, layers.6.attention.wk.weight ram used: 0.74 GB, layers.6.attention.wv.weight ram used: 0.74 GB, layers.6.attention.wo.weight ram used: 0.75 GB, layers.6.feed_forward.w1.weight ram used: 0.78 GB, layers.6.feed_forward.w2.weight ram used: 0.82 GB, layers.6.feed_forward.w3.weight ram used: 0.85 GB, layers.6.attention_norm.weight ram used: 0.85 GB, layers.6.ffn_norm.weight ram used: 0.85 GB, layers.7.attention.wq.weight ram used: 0.86 GB, layers.7.attention.wk.weight ram used: 0.86 GB, layers.7.attention.wv.weight ram used: 0.86 GB, layers.7.attention.wo.weight ram used: 0.87 GB, layers.7.feed_forward.w1.weight ram used: 0.91 GB, layers.7.feed_forward.w2.weight ram used: 0.94 GB, layers.7.feed_forward.w3.weight ram used: 0.97 GB, layers.7.attention_norm.weight ram used: 0.97 GB, layers.7.ffn_norm.weight ram used: 0.97 GB, layers.8.attention.wq.weight ram used: 0.98 GB, layers.8.attention.wk.weight ram used: 0.98 GB, layers.8.attention.wv.weight ram used: 0.99 GB, layers.8.attention.wo.weight ram used: 0.99 GB, layers.8.feed_forward.w1.weight ram used: 1.03 GB, layers.8.feed_forward.w2.weight ram used: 1.06 GB, layers.8.feed_forward.w3.weight ram used: 1.09 GB, layers.8.attention_norm.weight ram used: 1.09 GB, layers.8.ffn_norm.weight ram used: 1.09 GB, layers.9.attention.wq.weight ram used: 1.10 GB, layers.9.attention.wk.weight ram used: 1.11 GB, layers.9.attention.wv.weight ram used: 1.11 GB, layers.9.attention.wo.weight ram used: 1.12 GB, layers.9.feed_forward.w1.weight ram used: 1.15 GB, layers.9.feed_forward.w2.weight ram used: 1.18 GB, layers.9.feed_forward.w3.weight ram used: 1.22 GB, layers.9.attention_norm.weight ram used: 1.22 GB, layers.9.ffn_norm.weight ram used: 1.22 GB, layers.10.attention.wq.weight ram used: 1.22 GB, layers.10.attention.wk.weight ram used: 1.22 GB, layers.10.attention.wv.weight ram used: 1.22 GB, layers.10.attention.wo.weight ram used: 1.22 GB, layers.10.feed_forward.w1.weight ram used: 1.22 GB, layers.10.feed_forward.w2.weight ram used: 1.22 GB, layers.10.feed_forward.w3.weight ram used: 1.22 GB, layers.10.attention_norm.weight ram used: 1.22 GB, layers.10.ffn_norm.weight ram used: 1.22 GB, layers.11.attention.wq.weight ram used: 1.22 GB, layers.11.attention.wk.weight ram used: 1.22 GB, layers.11.attention.wv.weight ram used: 1.22 GB, layers.11.attention.wo.weight ram used: 1.22 GB, layers.11.feed_forward.w1.weight ram used: 1.22 GB, layers.11.feed_forward.w2.weight ram used: 1.22 GB, layers.11.feed_forward.w3.weight ram used: 1.22 GB, layers.11.attention_norm.weight ram used: 1.22 GB, layers.11.ffn_norm.weight ram used: 1.22 GB, layers.12.attention.wq.weight ram used: 1.22 GB, layers.12.attention.wk.weight ram used: 1.22 GB, layers.12.attention.wv.weight ram used: 1.22 GB, layers.12.attention.wo.weight ram used: 1.22 GB, layers.12.feed_forward.w1.weight ram used: 1.22 GB, layers.12.feed_forward.w2.weight ram used: 1.22 GB, layers.12.feed_forward.w3.weight ram used: 1.22 GB, layers.12.attention_norm.weight ram used: 1.22 GB, layers.12.ffn_norm.weight ram used: 1.22 GB, layers.13.attention.wq.weight ram used: 1.22 GB, layers.13.attention.wk.weight ram used: 1.22 GB, layers.13.attention.wv.weight ram used: 1.22 GB, layers.13.attention.wo.weight ram used: 1.22 GB, layers.13.feed_forward.w1.weight ram used: 1.22 GB, layers.13.feed_forward.w2.weight ram used: 1.22 GB, layers.13.feed_forward.w3.weight ram used: 1.22 GB, layers.13.attention_norm.weight ram used: 1.22 GB, layers.13.ffn_norm.weight ram used: 1.22 GB, layers.14.attention.wq.weight ram used: 1.22 GB, layers.14.attention.wk.weight ram used: 1.22 GB, layers.14.attention.wv.weight ram used: 1.22 GB, layers.14.attention.wo.weight ram used: 1.22 GB, layers.14.feed_forward.w1.weight ram used: 1.22 GB, layers.14.feed_forward.w2.weight ram used: 1.22 GB, layers.14.feed_forward.w3.weight ram used: 1.22 GB, layers.14.attention_norm.weight ram used: 1.22 GB, layers.14.ffn_norm.weight ram used: 1.22 GB, layers.15.attention.wq.weight ram used: 1.22 GB, layers.15.attention.wk.weight ram used: 1.22 GB, layers.15.attention.wv.weight ram used: 1.22 GB, layers.15.attention.wo.weight ram used: 1.22 GB, layers.15.feed_forward.w1.weight ram used: 1.22 GB, layers.15.feed_forward.w2.weight ram used: 1.22 GB, layers.15.feed_forward.w3.weight ram used: 1.22 GB, layers.15.attention_norm.weight ram used: 1.22 GB, layers.15.ffn_norm.weight ram used: 1.22 GB, norm.weight Download error on attempt 4/30 for repo_id='TriAi target_dir=PosixPath('/tmp/exo/TriAiExperiments-- Traceback (most recent call last): File "/home/jetnx/Desktop/llm/exo/exo/download/ne try: return await _download_file(repo_id, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jetnx/Desktop/llm/exo/exo/download/ne assert r.status in [200, 206], f"Failed ^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Failed to download model.safetens https://hf-mirror.com/TriAiExperiments/SFR-Iterat ram used: 1.22 GB, tok_embeddings.weight ram used: 1.74 GB, output.weight ram used: 1.74 GB, freqs_cis ram used: 1.74 GB, freqs_cis loaded weights in 3409.17 ms, 1.74 GB loaded 0%| ram used: 1.74 GB, layers.0.attention.wq.weight ram used: 1.75 GB, layers.0.attention.wk.weight ram used: 1.75 GB, layers.0.attention.wv.weight ram used: 1.75 GB, layers.0.attention.wo.weight ram used: 1.76 GB, layers.0.feed_forward.w1.weight ram used: 1.80 GB, layers.0.feed_forward.w2.weight ram used: 1.83 GB, layers.0.feed_forward.w3.weight ram used: 1.86 GB, layers.0.attention_norm.weight ram used: 1.86 GB, layers.0.ffn_norm.weight ram used: 1.86 GB, layers.1.attention.wq.weight ram used: 1.87 GB, layers.1.attention.wk.weight ram used: 1.87 GB, layers.1.attention.wv.weight ram used: 1.88 GB, layers.1.attention.wo.weight ram used: 1.88 GB, layers.1.feed_forward.w1.weight ram used: 1.92 GB, layers.1.feed_forward.w2.weight ram used: 1.95 GB, layers.1.feed_forward.w3.weight ram used: 1.99 GB, layers.1.attention_norm.weight ram used: 1.99 GB, layers.1.ffn_norm.weight ram used: 1.99 GB, layers.2.attention.wq.weight ram used: 1.99 GB, layers.2.attention.wk.weight ram used: 2.00 GB, layers.2.attention.wv.weight ram used: 2.00 GB, layers.2.attention.wo.weight ram used: 2.01 GB, layers.2.feed_forward.w1.weight ram used: 2.04 GB, layers.2.feed_forward.w2.weight ram used: 2.07 GB, layers.2.feed_forward.w3.weight ram used: 2.11 GB, layers.2.attention_norm.weight ram used: 2.11 GB, layers.2.ffn_norm.weight ram used: 2.11 GB, layers.3.attention.wq.weight ram used: 2.12 GB, layers.3.attention.wk.weight ram used: 2.12 GB, layers.3.attention.wv.weight ram used: 2.12 GB, layers.3.attention.wo.weight ram used: 2.13 GB, layers.3.feed_forward.w1.weight ram used: 2.16 GB, layers.3.feed_forward.w2.weight ram used: 2.19 GB, layers.3.feed_forward.w3.weight ram used: 2.23 GB, layers.3.attention_norm.weight ram used: 2.23 GB, layers.3.ffn_norm.weight ram used: 2.23 GB, layers.4.attention.wq.weight ram used: 2.24 GB, layers.4.attention.wk.weight ram used: 2.24 GB, layers.4.attention.wv.weight ram used: 2.24 GB, layers.4.attention.wo.weight ram used: 2.25 GB, layers.4.feed_forward.w1.weight ram used: 2.28 GB, layers.4.feed_forward.w2.weight Download error on attempt 5/30 for repo_id='TriAi target_dir=PosixPath('/tmp/exo/TriAiExperiments-- Traceback (most recent call last): ram used: 2.32 GB, layers.4.feed_forward.w3.weight ram used: 2.32 GB, layers.4.feed_forward.w3.weight File "/home/jetnx/Desktop/llm/exo/exo/download/ne try: return await _download_file(repo_id, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ram used: 2.35 GB, layers.4.attention_norm.weight ram used: 2.35 GB, layers.4.attention_norm.weight File "/home/jetnx/Desktop/llm/exo/exo/download/ne assert r.status in [200, 206], f"Failed ^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Failed to download model.safetens https://hf-mirror.com/TriAiExperiments/SFR-Iterat ram used: 2.35 GB, layers.4.ffn_norm.weight ram used: 2.35 GB, layers.5.attention.wq.weight ram used: 2.36 GB, layers.5.attention.wk.weight ram used: 2.36 GB, layers.5.attention.wv.weight ram used: 2.36 GB, layers.5.attention.wo.weight ram used: 2.37 GB, layers.5.feed_forward.w1.weight ram used: 2.40 GB, layers.5.feed_forward.w2.weight ram used: 2.44 GB, layers.5.feed_forward.w3.weight ram used: 2.47 GB, layers.5.attention_norm.weight ram used: 2.47 GB, layers.5.ffn_norm.weight ram used: 2.47 GB, layers.6.attention.wq.weight ram used: 2.48 GB, layers.6.attention.wk.weight ram used: 2.48 GB, layers.6.attention.wv.weight ram used: 2.48 GB, layers.6.attention.wo.weight ram used: 2.49 GB, layers.6.feed_forward.w1.weight ram used: 2.53 GB, layers.6.feed_forward.w2.weight ram used: 2.56 GB, layers.6.feed_forward.w3.weight ram used: 2.59 GB, layers.6.attention_norm.weight ram used: 2.59 GB, layers.6.ffn_norm.weight ram used: 2.59 GB, layers.7.attention.wq.weight ram used: 2.60 GB, layers.7.attention.wk.weight ram used: 2.60 GB, layers.7.attention.wv.weight ram used: 2.61 GB, layers.7.attention.wo.weight ram used: 2.61 GB, layers.7.feed_forward.w1.weight ram used: 2.65 GB, layers.7.feed_forward.w2.weight ram used: 2.68 GB, layers.7.feed_forward.w3.weight ram used: 2.71 GB, layers.7.attention_norm.weight ram used: 2.71 GB, layers.7.ffn_norm.weight ram used: 2.71 GB, layers.8.attention.wq.weight ram used: 2.72 GB, layers.8.attention.wk.weight ram used: 2.73 GB, layers.8.attention.wv.weight ram used: 2.73 GB, layers.8.attention.wo.weight ram used: 2.74 GB, layers.8.feed_forward.w1.weight ram used: 2.77 GB, layers.8.feed_forward.w2.weight ram used: 2.80 GB, layers.8.feed_forward.w3.weight ram used: 2.84 GB, layers.8.attention_norm.weight ram used: 2.84 GB, layers.8.ffn_norm.weight ram used: 2.84 GB, layers.9.attention.wq.weight ram used: 2.84 GB, layers.9.attention.wk.weight ram used: 2.85 GB, layers.9.attention.wv.weight ram used: 2.85 GB, layers.9.attention.wo.weight ram used: 2.86 GB, layers.9.feed_forward.w1.weight ram used: 2.89 GB, layers.9.feed_forward.w2.weight ram used: 2.92 GB, layers.9.feed_forward.w3.weight ram used: 2.96 GB, layers.9.attention_norm.weight ram used: 2.96 GB, layers.9.ffn_norm.weight ram used: 2.96 GB, layers.10.attention.wq.weight ram used: 2.96 GB, layers.10.attention.wk.weight ram used: 2.96 GB, layers.10.attention.wv.weight ram used: 2.96 GB, layers.10.attention.wo.weight ram used: 2.96 GB, layers.10.feed_forward.w1.weight ram used: 2.96 GB, layers.10.feed_forward.w2.weight ram used: 2.96 GB, layers.10.feed_forward.w3.weight ram used: 2.96 GB, layers.10.attention_norm.weight ram used: 2.96 GB, layers.10.ffn_norm.weight ram used: 2.96 GB, layers.11.attention.wq.weight ram used: 2.96 GB, layers.11.attention.wk.weight ram used: 2.96 GB, layers.11.attention.wv.weight ram used: 2.96 GB, layers.11.attention.wo.weight ram used: 2.96 GB, layers.11.feed_forward.w1.weight ram used: 2.96 GB, layers.11.feed_forward.w2.weight ram used: 2.96 GB, layers.11.feed_forward.w3.weight ram used: 2.96 GB, layers.11.attention_norm.weight ram used: 2.96 GB, layers.11.ffn_norm.weight ram used: 2.96 GB, layers.12.attention.wq.weight ram used: 2.96 GB, layers.12.attention.wk.weight ram used: 2.96 GB, layers.12.attention.wv.weight ram used: 2.96 GB, layers.12.attention.wo.weight ram used: 2.96 GB, layers.12.feed_forward.w1.weight ram used: 2.96 GB, layers.12.feed_forward.w2.weight ram used: 2.96 GB, layers.12.feed_forward.w3.weight ram used: 2.96 GB, layers.12.attention_norm.weight ram used: 2.96 GB, layers.12.ffn_norm.weight ram used: 2.96 GB, layers.13.attention.wq.weight ram used: 2.96 GB, layers.13.attention.wk.weight ram used: 2.96 GB, layers.13.attention.wv.weight ram used: 2.96 GB, layers.13.attention.wo.weight ram used: 2.96 GB, layers.13.feed_forward.w1.weight ram used: 2.96 GB, layers.13.feed_forward.w2.weight ram used: 2.96 GB, layers.13.feed_forward.w3.weight ram used: 2.96 GB, layers.13.attention_norm.weight ram used: 2.96 GB, layers.13.ffn_norm.weight ram used: 2.96 GB, layers.14.attention.wq.weight ram used: 2.96 GB, layers.14.attention.wk.weight ram used: 2.96 GB, layers.14.attention.wv.weight ram used: 2.96 GB, layers.14.attention.wo.weight ram used: 2.96 GB, layers.14.feed_forward.w1.weight ram used: 2.96 GB, layers.14.feed_forward.w2.weight ram used: 2.96 GB, layers.14.feed_forward.w3.weight ram used: 2.96 GB, layers.14.attention_norm.weight ram used: 2.96 GB, layers.14.ffn_norm.weight ram used: 2.96 GB, layers.15.attention.wq.weight ram used: 2.96 GB, layers.15.attention.wk.weight ram used: 2.96 GB, layers.15.attention.wv.weight ram used: 2.96 GB, layers.15.attention.wo.weight ram used: 2.96 GB, layers.15.feed_forward.w1.weight ram used: 2.96 GB, layers.15.feed_forward.w2.weight ram used: 2.96 GB, layers.15.feed_forward.w3.weight ram used: 2.96 GB, layers.15.attention_norm.weight ram used: 2.96 GB, layers.15.ffn_norm.weight ram used: 2.96 GB, norm.weight ram used: 2.96 GB, tok_embeddings.weight ram used: 3.48 GB, output.weight ram used: 3.48 GB, freqs_cis ram used: 3.48 GB, freqs_cis loaded weights in 3614.36 ms, 1.74 GB loaded 0.14 GB, layers.1.feed_forward.w1.weight : 9%|██████▌ | 14/148 [00:00<00:03, 33.65it/s] : 10%|██████▉ | 15/148 [00:00<00:03, 33.42it/s] : 11%|███████▍ | 16/148 [00:00<00:03, 33.22it/s] : 11%|███████▉ | 17/148 [00:00<00:03, 34.33it/s] : 12%|████████▍ | 18/148 [00:00<00:03, 35.36it/s] : 13%|████████▊ | 19/148 [00:00<00:03, 35.89it/s] : 14%|█████████▎ | 20/148 [00:00<00:03, 36.66it/s] : 14%|█████████▊ | 21/148 [00:00<00:03, 37.49it/s] : 15%|██████████▎ | 22/148 [00:00<00:03, 38.04it/s] : 16%|██████████▋ | 23/148 [00:00<00:03, 37.67it/s] : 16%|███████████▏ | 24/148 [00:00<00:03, 37.34it/s] : 17%|███████████▋ | 25/148 [00:00<00:03, 37.04it/s] : 18%|████████████ | 26/148 [00:00<00:03, 37.84it/s] : 18%|████████████▌ | 27/148 [00:00<00:03, 38.66it/s] : 19%|█████████████ | 28/148 [00:00<00:03, 39.05it/s] : 20%|█████████████▌ | 29/148 [00:00<00:03, 39.63it/s] : 20%|█████████████▉ | 30/148 [00:00<00:02, 40.22it/s] : 21%|██████████████▍ | 31/148 [00:00<00:02, 40.60it/s] : 22%|██████████████▉ | 32/148 [00:00<00:02, 40.21it/s] : 22%|███████████████▍ | 33/148 [00:00<00:02, 39.87it/s] : 23%|███████████████▊ | 34/148 [00:00<00:02, 39.55it/s] : 24%|████████████████▎ | 35/148 [00:00<00:02, 40.15it/s] : 24%|████████████████▊ | 36/148 [00:00<00:02, 40.76it/s] : 25%|█████████████████▎ | 37/148 [00:00<00:02, 41.06it/s] : 26%|█████████████████▋ | 38/148 [00:00<00:02, 41.50it/s] : 26%|██████████████████▏ | 39/148 [00:00<00:02, 41.95it/s] : 27%|██████████████████▋ | 40/148 [00:00<00:02, 42.22it/s] : 28%|███████████████████ | 41/148 [00:00<00:02, 41.82it/s] : 28%|███████████████████▌ | 42/148 [00:01<00:02, 41.45it/s] : 29%|████████████████████ | 43/148 [00:01<00:02, 41.11it/s] : 30%|████████████████████▌ | 44/148 [00:01<00:02, 41.58it/s] : 30%|████████████████████▉ | 45/148 [00:01<00:02, 42.06it/s] : 31%|█████████████████████▍ | 46/148 [00:01<00:02, 42.28it/s] : 32%|█████████████████████▉ | 47/148 [00:01<00:02, 42.62it/s] : 32%|██████████████████████▍ | 48/148 [00:01<00:02, 42.98it/s] : 33%|██████████████████████▊ | 49/148 [00:01<00:02, 43.18it/s] : 34%|███████████████████████▎ | 50/148 [00:01<00:02, 42.84it/s] : 34%|███████████████████████▊ | 51/148 [00:01<00:02, 42.51it/s] : 35%|████████████████████████▏ | 52/148 [00:01<00:02, 42.21it/s] : 36%|████████████████████████▋ | 53/148 [00:01<00:02, 42.60it/s] : 36%|█████████████████████████▏ | 54/148 [00:01<00:02, 43.00it/s] : 37%|█████████████████████████▋ | 55/148 [00:01<00:02, 43.18it/s] : 38%|██████████████████████████ | 56/148 [00:01<00:02, 43.47it/s] : 39%|██████████████████████████▌ | 57/148 [00:01<00:02, 43.77it/s] : 39%|███████████████████████████ | 58/148 [00:01<00:02, 43.93it/s] : 40%|███████████████████████████▌ | 59/148 [00:01<00:02, 43.63it/s] : 41%|███████████████████████████▉ | 60/148 [00:01<00:02, 43.34it/s] : 41%|████████████████████████████▍ | 61/148 [00:01<00:02, 43.05it/s] : 42%|████████████████████████████▉ | 62/148 [00:01<00:01, 43.36it/s] : 43%|█████████████████████████████▎ | 63/148 [00:01<00:01, 43.69it/s] : 43%|█████████████████████████████▊ | 64/148 [00:01<00:01, 43.84it/s] : 44%|██████████████████████████████▎ | 65/148 [00:01<00:01, 44.09it/s] : 45%|██████████████████████████████▊ | 66/148 [00:01<00:01, 44.34it/s] : 45%|███████████████████████████████▏ | 67/148 [00:01<00:01, 44.47it/s] : 46%|███████████████████████████████▋ | 68/148 [00:01<00:01, 44.13it/s] : 47%|████████████████████████████████▏ | 69/148 [00:01<00:01, 43.87it/s] : 47%|████████████████████████████████▋ | 70/148 [00:01<00:01, 43.53it/s] : 48%|█████████████████████████████████ | 71/148 [00:01<00:01, 43.81it/s] : 49%|█████████████████████████████████▌ | 72/148 [00:01<00:01, 44.12it/s] : 49%|██████████████████████████████████ | 73/148 [00:01<00:01, 44.12it/s] : 50%|██████████████████████████████████▌ | 74/148 [00:01<00:01, 44.29it/s] : 51%|██████████████████████████████████▉ | 75/148 [00:01<00:01, 44.48it/s] : 51%|███████████████████████████████████▍ | 76/148 [00:01<00:01, 44.46it/s] : 52%|███████████████████████████████████▉ | 77/148 [00:01<00:01, 43.75it/s] : 53%|████████████████████████████████████▎ | 78/148 [00:01<00:01, 43.11it/s] : 53%|████████████████████████████████████▊ | 79/148 [00:01<00:01, 42.50it/s] : 54%|█████████████████████████████████████▎ | 80/148 [00:01<00:01, 42.76it/s] : 55%|█████████████████████████████████████▊ | 81/148 [00:01<00:01, 43.02it/s] : 55%|██████████████████████████████████████▏ | 82/148 [00:01<00:01, 43.04it/s] : 56%|██████████████████████████████████████▋ | 83/148 [00:01<00:01, 43.21it/s] : 57%|███████████████████████████████████████▏ | 84/148 [00:01<00:01, 43.37it/s] : 57%|███████████████████████████████████████▋ | 85/148 [00:01<00:01, 43.15it/s] : 58%|████████████████████████████████████████ | 86/148 [00:02<00:01, 42.58it/s] : 59%|████████████████████████████████████████▌ | 87/148 [00:02<00:01, 42.03it/s] : 59%|█████████████████████████████████████████ | 88/148 [00:02<00:01, 41.52it/s] : 60%|█████████████████████████████████████████▍ | 89/148 [00:02<00:01, 41.75it/s] : 61%|█████████████████████████████████████████▉ | 90/148 [00:02<00:01, 41.99it/s] : 61%|██████████████████████████████████████████▍ | 91/148 [00:02<00:01, 42.30it/s] : 62%|██████████████████████████████████████████▉ | 92/148 [00:02<00:01, 42.60it/s] : 63%|███████████████████████████████████████████▎ | 93/148 [00:02<00:01, 42.90it/s] : 64%|███████████████████████████████████████████▊ | 94/148 [00:02<00:01, 43.20it/s] : 64%|████████████████████████████████████████████▎ | 95/148 [00:02<00:01, 43.50it/s] : 65%|████████████████████████████████████████████▊ | 96/148 [00:02<00:01, 43.79it/s] : 66%|█████████████████████████████████████████████▏ | 97/148 [00:02<00:01, 44.08it/s] : 66%|█████████████████████████████████████████████▋ | 98/148 [00:02<00:01, 44.38it/s] : 67%|██████████████████████████████████████████████▏ | 99/148 [00:02<00:01, 44.67it/s] : 68%|█████████████████████████████████████████████▉ | 100/148 [00:02<00:01, 44.95it/s] : 68%|██████████████████████████████████████████████▍ | 101/148 [00:02<00:01, 45.24it/s] : 69%|██████████████████████████████████████████████▊ | 102/148 [00:02<00:01, 45.52it/s] : 70%|███████████████████████████████████████████████▎ | 103/148 [00:02<00:00, 45.81it/s] : 70%|███████████████████████████████████████████████▊ | 104/148 [00:02<00:00, 46.09it/s] : 71%|████████████████████████████████████████████████▏ | 105/148 [00:02<00:00, 46.37it/s] : 72%|████████████████████████████████████████████████▋ | 106/148 [00:02<00:00, 46.64it/s] : 72%|█████████████████████████████████████████████████▏ | 107/148 [00:02<00:00, 46.92it/s] : 73%|█████████████████████████████████████████████████▌ | 108/148 [00:02<00:00, 47.19it/s] : 74%|██████████████████████████████████████████████████ | 109/148 [00:02<00:00, 47.46it/s] : 74%|██████████████████████████████████████████████████▌ | 110/148 [00:02<00:00, 47.73it/s] : 75%|███████████████████████████████████████████████████ | 111/148 [00:02<00:00, 48.00it/s] : 76%|███████████████████████████████████████████████████▍ | 112/148 [00:02<00:00, 48.27it/s] : 76%|███████████████████████████████████████████████████▉ | 113/148 [00:02<00:00, 48.53it/s] : 77%|████████████████████████████████████████████████████▍ | 114/148 [00:02<00:00, 48.79it/s] : 78%|████████████████████████████████████████████████████▊ | 115/148 [00:02<00:00, 49.05it/s] : 78%|█████████████████████████████████████████████████████▎ | 116/148 [00:02<00:00, 49.31it/s] : 79%|█████████████████████████████████████████████████████▊ | 117/148 [00:02<00:00, 49.57it/s] : 80%|██████████████████████████████████████████████████████▏ | 118/148 [00:02<00:00, 49.82it/s] : 80%|██████████████████████████████████████████████████████▋ | 119/148 [00:02<00:00, 50.08it/s] : 81%|███████████████████████████████████████████████████████▏ | 120/148 [00:02<00:00, 50.33it/s] : 82%|███████████████████████████████████████████████████████▌ | 121/148 [00:02<00:00, 50.58it/s] : 82%|████████████████████████████████████████████████████████ | 122/148 [00:02<00:00, 50.83it/s] : 83%|████████████████████████████████████████████████████████▌ | 123/148 [00:02<00:00, 51.07it/s] : 84%|████████████████████████████████████████████████████████▉ | 124/148 [00:02<00:00, 51.32it/s] : 84%|█████████████████████████████████████████████████████████▍ | 125/148 [00:02<00:00, 51.56it/s] : 85%|█████████████████████████████████████████████████████████▉ | 126/148 [00:02<00:00, 51.80it/s] : 86%|██████████████████████████████████████████████████████████▎ | 127/148 [00:02<00:00, 52.04it/s] : 86%|██████████████████████████████████████████████████████████▊ | 128/148 [00:02<00:00, 52.28it/s] : 87%|███████████████████████████████████████████████████████████▎ | 129/148 [00:02<00:00, 52.52it/s] : 88%|███████████████████████████████████████████████████████████▋ | 130/148 [00:02<00:00, 52.76it/s] : 89%|████████████████████████████████████████████████████████████▏ | 131/148 [00:02<00:00, 52.99it/s] : 89%|████████████████████████████████████████████████████████████▋ | 132/148 [00:02<00:00, 53.23it/s] : 90%|█████████████████████████████████████████████████████████████ | 133/148 [00:02<00:00, 53.46it/s] : 91%|█████████████████████████████████████████████████████████████▌ | 134/148 [00:02<00:00, 53.69it/s] : 91%|██████████████████████████████████████████████████████████████ | 135/148 [00:02<00:00, 53.91it/s] : 92%|██████████████████████████████████████████████████████████████▍ | 136/148 [00:02<00:00, 54.14it/s] : 93%|██████████████████████████████████████████████████████████████▉ | 137/148 [00:02<00:00, 54.34it/s] : 93%|███████████████████████████████████████████████████████████████▍ | 138/148 [00:02<00:00, 54.56it/s] : 94%|███████████████████████████████████████████████████████████████▊ | 139/148 [00:02<00:00, 54.76it/s] : 95%|████████████████████████████████████████████████████████████████▎ | 140/148 [00:02<00:00, 54.97it/s] : 95%|████████████████████████████████████████████████████████████████▊ | 141/148 [00:02<00:00, 55.14it/s] : 96%|█████████████████████████████████████████████████████████████████▏ | 142/148 [00:02<00:00, 55.36it/s] : 97%|█████████████████████████████████████████████████████████████████▋ | 143/148 [00:02<00:00, 55.57it/s] : 97%|██████████████████████████████████████████████████████████████████▏ | 144/148 [00:02<00:00, 55.78it/s] : 98%|██████████████████████████████████████████████████████████████████▌ | 145/148 [00:02<00:00, 55.90it/s] Experiments/SFR-Iterative-DPO-LLaMA-3-70B-R' revision='main' path='model.safetensors.index.json' SFR-Iterative-DPO-LLaMA-3-70B-R') w_shard_download.py", line 134, in download_file_with_retry revision, path, target_dir, on_progress) ^^^^^^^^^^^^^^^^^^^^^ w_shard_download.py", line 156, in _download_file to download {path} from {url}: {r.status}" ors.index.json from ive-DPO-LLaMA-3-70B-R/resolve/main/model.safetensors.index.json: 401 : 99%|███████████████████████████████████████████████████████████████████ | 146/148 [00:03<00:00, 43.26it/s] : 99%|███████████████████████████████████████████████████████████████████▌| 147/148 [00:03<00:00, 43.43it/s] : 100%|████████████████████████████████████████████████████████████████████| 148/148 [00:03<00:00, 43.63it/s] : 100%|████████████████████████████████████████████████████████████████████| 148/148 [00:03<00:00, 43.52it/s] at 0.51 GB/s | 0/148 [00:00<?, ?it/s] : 1%|▍ | 1/148 [00:00<00:03, 42.65it/s] : 1%|▉ | 2/148 [00:00<00:02, 49.67it/s] : 2%|█▍ | 3/148 [00:00<00:02, 53.50it/s] : 3%|█▉ | 4/148 [00:00<00:02, 50.39it/s] : 3%|██▎ | 5/148 [00:00<00:03, 38.54it/s] : 4%|██▊ | 6/148 [00:00<00:04, 33.33it/s] : 5%|███▎ | 7/148 [00:00<00:04, 30.32it/s] : 5%|███▊ | 8/148 [00:00<00:04, 32.85it/s] : 6%|████▎ | 9/148 [00:00<00:03, 35.29it/s] : 7%|████▋ | 10/148 [00:00<00:03, 35.99it/s] : 7%|█████▏ | 11/148 [00:00<00:03, 37.40it/s] : 8%|█████▌ | 12/148 [00:00<00:03, 38.74it/s] : 9%|██████ | 13/148 [00:00<00:03, 39.08it/s] : 9%|██████▌ | 14/148 [00:00<00:03, 36.56it/s] : 10%|██████▉ | 15/148 [00:00<00:03, 34.60it/s] : 11%|███████▍ | 16/148 [00:00<00:03, 33.05it/s] : 11%|███████▉ | 17/148 [00:00<00:03, 34.23it/s] : 12%|████████▍ | 18/148 [00:00<00:03, 35.40it/s] : 13%|████████▊ | 19/148 [00:00<00:03, 35.77it/s] : 14%|█████████▎ | 20/148 [00:00<00:03, 36.55it/s] : 14%|█████████▊ | 21/148 [00:00<00:03, 37.30it/s] : 15%|██████████▎ | 22/148 [00:00<00:03, 37.56it/s] : 16%|██████████▋ | 23/148 [00:00<00:03, 36.20it/s] : 16%|███████████▏ | 24/148 [00:00<00:03, 35.03it/s] : 17%|███████████▋ | 25/148 [00:00<00:03, 34.02it/s] : 18%|████████████ | 26/148 [00:00<00:03, 34.79it/s] : 18%|████████████▌ | 27/148 [00:00<00:03, 35.54it/s] : 19%|█████████████ | 28/148 [00:00<00:03, 35.79it/s] : 20%|█████████████▌ | 29/148 [00:00<00:03, 36.32it/s] : 20%|█████████████▉ | 30/148 [00:00<00:03, 36.85it/s] : 21%|██████████████▍ | 31/148 [00:00<00:03, 37.05it/s] : 22%|██████████████▉ | 32/148 [00:00<00:03, 36.10it/s] : 22%|███████████████▍ | 33/148 [00:00<00:03, 35.26it/s] : 23%|███████████████▊ | 34/148 [00:00<00:03, 34.51it/s] : 24%|████████████████▎ | 35/148 [00:00<00:03, 35.08it/s] : 24%|████████████████▊ | 36/148 [00:01<00:03, 35.67it/s] : 25%|█████████████████▎ | 37/148 [00:01<00:03, 35.86it/s] : 26%|█████████████████▋ | 38/148 [00:01<00:03, 36.26it/s] : 26%|██████████████████▏ | 39/148 [00:01<00:02, 36.67it/s] : 27%|██████████████████▋ | 40/148 [00:01<00:02, 36.83it/s] : 28%|███████████████████ | 41/148 [00:01<00:02, 36.10it/s] : 28%|███████████████████▌ | 42/148 [00:01<00:02, 35.43it/s] Experiments/SFR-Iterative-DPO-LLaMA-3-70B-R' revision='main' path='model.safetensors.index.json' SFR-Iterative-DPO-LLaMA-3-70B-R') : 29%|████████████████████ | 43/148 [00:01<00:03, 34.69it/s] : 29%|████████████████████ | 43/148 [00:01<00:03, 34.69it/s] w_shard_download.py", line 134, in download_file_with_retry revision, path, target_dir, on_progress) ^^^^^^^^^^^^^^^^^^^^^ : 30%|████████████████████▌ | 44/148 [00:01<00:03, 34.65it/s] : 30%|████████████████████▌ | 44/148 [00:01<00:03, 34.65it/s] w_shard_download.py", line 156, in _download_file to download {path} from {url}: {r.status}" ors.index.json from ive-DPO-LLaMA-3-70B-R/resolve/main/model.safetensors.index.json: 401 : 30%|████████████████████▉ | 45/148 [00:01<00:02, 34.50it/s] : 31%|█████████████████████▍ | 46/148 [00:01<00:02, 34.28it/s] : 32%|█████████████████████▉ | 47/148 [00:01<00:02, 34.56it/s] : 32%|██████████████████████▍ | 48/148 [00:01<00:02, 34.88it/s] : 33%|██████████████████████▊ | 49/148 [00:01<00:02, 35.03it/s] : 34%|███████████████████████▎ | 50/148 [00:01<00:02, 34.50it/s] : 34%|███████████████████████▊ | 51/148 [00:01<00:02, 34.01it/s] : 35%|████████████████████████▏ | 52/148 [00:01<00:02, 33.56it/s] : 36%|████████████████████████▋ | 53/148 [00:01<00:02, 33.93it/s] : 36%|█████████████████████████▏ | 54/148 [00:01<00:02, 34.32it/s] : 37%|█████████████████████████▋ | 55/148 [00:01<00:02, 34.46it/s] : 38%|██████████████████████████ | 56/148 [00:01<00:02, 34.74it/s] : 39%|██████████████████████████▌ | 57/148 [00:01<00:02, 35.03it/s] : 39%|███████████████████████████ | 58/148 [00:01<00:02, 35.16it/s] : 40%|███████████████████████████▌ | 59/148 [00:01<00:02, 34.73it/s] : 41%|███████████████████████████▉ | 60/148 [00:01<00:02, 34.32it/s] : 41%|████████████████████████████▍ | 61/148 [00:01<00:02, 33.94it/s] : 42%|████████████████████████████▉ | 62/148 [00:01<00:02, 34.26it/s] : 43%|█████████████████████████████▎ | 63/148 [00:01<00:02, 34.60it/s] : 43%|█████████████████████████████▊ | 64/148 [00:01<00:02, 34.72it/s] : 44%|██████████████████████████████▎ | 65/148 [00:01<00:02, 34.95it/s] : 45%|██████████████████████████████▊ | 66/148 [00:01<00:02, 35.20it/s] : 45%|███████████████████████████████▏ | 67/148 [00:01<00:02, 35.31it/s] : 46%|███████████████████████████████▋ | 68/148 [00:01<00:02, 34.83it/s] : 47%|████████████████████████████████▏ | 69/148 [00:02<00:02, 34.47it/s] : 47%|████████████████████████████████▋ | 70/148 [00:02<00:02, 34.13it/s] : 48%|█████████████████████████████████ | 71/148 [00:02<00:02, 34.41it/s] : 49%|█████████████████████████████████▌ | 72/148 [00:02<00:02, 34.70it/s] : 49%|██████████████████████████████████ | 73/148 [00:02<00:02, 34.80it/s] : 50%|██████████████████████████████████▌ | 74/148 [00:02<00:02, 35.01it/s] : 51%|██████████████████████████████████▉ | 75/148 [00:02<00:02, 35.23it/s] : 51%|███████████████████████████████████▍ | 76/148 [00:02<00:02, 35.33it/s] : 52%|███████████████████████████████████▉ | 77/148 [00:02<00:02, 34.99it/s] : 53%|████████████████████████████████████▎ | 78/148 [00:02<00:02, 34.67it/s] : 53%|████████████████████████████████████▊ | 79/148 [00:02<00:02, 34.36it/s] : 54%|█████████████████████████████████████▎ | 80/148 [00:02<00:01, 34.61it/s] : 55%|█████████████████████████████████████▊ | 81/148 [00:02<00:01, 34.87it/s] : 55%|██████████████████████████████████████▏ | 82/148 [00:02<00:01, 34.96it/s] : 56%|██████████████████████████████████████▋ | 83/148 [00:02<00:01, 35.15it/s] : 57%|███████████████████████████████████████▏ | 84/148 [00:02<00:01, 35.35it/s] : 57%|███████████████████████████████████████▋ | 85/148 [00:02<00:01, 35.43it/s] : 58%|████████████████████████████████████████ | 86/148 [00:02<00:01, 35.12it/s] : 59%|████████████████████████████████████████▌ | 87/148 [00:02<00:01, 34.81it/s] : 59%|█████████████████████████████████████████ | 88/148 [00:02<00:01, 34.52it/s] : 60%|█████████████████████████████████████████▍ | 89/148 [00:02<00:01, 34.74it/s] : 61%|█████████████████████████████████████████▉ | 90/148 [00:02<00:01, 34.97it/s] : 61%|██████████████████████████████████████████▍ | 91/148 [00:02<00:01, 35.25it/s] : 62%|██████████████████████████████████████████▉ | 92/148 [00:02<00:01, 35.53it/s] : 63%|███████████████████████████████████████████▎ | 93/148 [00:02<00:01, 35.80it/s] : 64%|███████████████████████████████████████████▊ | 94/148 [00:02<00:01, 36.08it/s] : 64%|████████████████████████████████████████████▎ | 95/148 [00:02<00:01, 36.35it/s] : 65%|████████████████████████████████████████████▊ | 96/148 [00:02<00:01, 36.62it/s] : 66%|█████████████████████████████████████████████▏ | 97/148 [00:02<00:01, 36.89it/s] : 66%|█████████████████████████████████████████████▋ | 98/148 [00:02<00:01, 37.16it/s] : 67%|██████████████████████████████████████████████▏ | 99/148 [00:02<00:01, 37.42it/s] : 68%|█████████████████████████████████████████████▉ | 100/148 [00:02<00:01, 37.69it/s] : 68%|██████████████████████████████████████████████▍ | 101/148 [00:02<00:01, 37.95it/s] : 69%|██████████████████████████████████████████████▊ | 102/148 [00:02<00:01, 38.21it/s] : 70%|███████████████████████████████████████████████▎ | 103/148 [00:02<00:01, 38.47it/s] : 70%|███████████████████████████████████████████████▊ | 104/148 [00:02<00:01, 38.73it/s] : 71%|████████████████████████████████████████████████▏ | 105/148 [00:02<00:01, 38.99it/s] : 72%|████████████████████████████████████████████████▋ | 106/148 [00:02<00:01, 39.24it/s] : 72%|█████████████████████████████████████████████████▏ | 107/148 [00:02<00:01, 39.50it/s] : 73%|█████████████████████████████████████████████████▌ | 108/148 [00:02<00:01, 39.75it/s] : 74%|██████████████████████████████████████████████████ | 109/148 [00:02<00:00, 40.00it/s] : 74%|██████████████████████████████████████████████████▌ | 110/148 [00:02<00:00, 40.25it/s] : 75%|███████████████████████████████████████████████████ | 111/148 [00:02<00:00, 40.50it/s] : 76%|███████████████████████████████████████████████████▍ | 112/148 [00:02<00:00, 40.75it/s] : 76%|███████████████████████████████████████████████████▉ | 113/148 [00:02<00:00, 40.99it/s] : 77%|████████████████████████████████████████████████████▍ | 114/148 [00:02<00:00, 41.24it/s] : 78%|████████████████████████████████████████████████████▊ | 115/148 [00:02<00:00, 41.48it/s] : 78%|█████████████████████████████████████████████████████▎ | 116/148 [00:02<00:00, 41.72it/s] : 79%|█████████████████████████████████████████████████████▊ | 117/148 [00:02<00:00, 41.96it/s] : 80%|██████████████████████████████████████████████████████▏ | 118/148 [00:02<00:00, 42.20it/s] : 80%|██████████████████████████████████████████████████████▋ | 119/148 [00:02<00:00, 42.44it/s] : 81%|███████████████████████████████████████████████████████▏ | 120/148 [00:02<00:00, 42.68it/s] : 82%|███████████████████████████████████████████████████████▌ | 121/148 [00:02<00:00, 42.91it/s] : 82%|████████████████████████████████████████████████████████ | 122/148 [00:02<00:00, 43.15it/s] : 83%|████████████████████████████████████████████████████████▌ | 123/148 [00:02<00:00, 43.38it/s] : 84%|████████████████████████████████████████████████████████▉ | 124/148 [00:02<00:00, 43.61it/s] : 84%|█████████████████████████████████████████████████████████▍ | 125/148 [00:02<00:00, 43.84it/s] : 85%|█████████████████████████████████████████████████████████▉ | 126/148 [00:02<00:00, 44.07it/s] : 86%|██████████████████████████████████████████████████████████▎ | 127/148 [00:02<00:00, 44.30it/s] : 86%|██████████████████████████████████████████████████████████▊ | 128/148 [00:02<00:00, 44.52it/s] : 87%|███████████████████████████████████████████████████████████▎ | 129/148 [00:02<00:00, 44.75it/s] : 88%|███████████████████████████████████████████████████████████▋ | 130/148 [00:02<00:00, 44.97it/s] : 89%|████████████████████████████████████████████████████████████▏ | 131/148 [00:02<00:00, 45.19it/s] : 89%|████████████████████████████████████████████████████████████▋ | 132/148 [00:02<00:00, 45.41it/s] : 90%|█████████████████████████████████████████████████████████████ | 133/148 [00:02<00:00, 45.63it/s] : 91%|█████████████████████████████████████████████████████████████▌ | 134/148 [00:02<00:00, 45.85it/s] : 91%|██████████████████████████████████████████████████████████████ | 135/148 [00:02<00:00, 46.07it/s] : 92%|██████████████████████████████████████████████████████████████▍ | 136/148 [00:02<00:00, 46.29it/s] : 93%|██████████████████████████████████████████████████████████████▉ | 137/148 [00:02<00:00, 46.50it/s] : 93%|███████████████████████████████████████████████████████████████▍ | 138/148 [00:02<00:00, 46.72it/s] : 94%|███████████████████████████████████████████████████████████████▊ | 139/148 [00:02<00:00, 46.93it/s] : 95%|████████████████████████████████████████████████████████████████▎ | 140/148 [00:02<00:00, 47.14it/s] : 95%|████████████████████████████████████████████████████████████████▊ | 141/148 [00:02<00:00, 47.32it/s] : 96%|█████████████████████████████████████████████████████████████████▏ | 142/148 [00:02<00:00, 47.53it/s] : 97%|█████████████████████████████████████████████████████████████████▋ | 143/148 [00:02<00:00, 47.74it/s] : 97%|██████████████████████████████████████████████████████████████████▏ | 144/148 [00:03<00:00, 47.95it/s] : 98%|██████████████████████████████████████████████████████████████████▌ | 145/148 [00:03<00:00, 48.04it/s] : 99%|███████████████████████████████████████████████████████████████████ | 146/148 [00:03<00:00, 40.77it/s] : 99%|███████████████████████████████████████████████████████████████████▌| 147/148 [00:03<00:00, 40.95it/s] : 100%|████████████████████████████████████████████████████████████████████| 148/148 [00:03<00:00, 41.14it/s] : 100%|████████████████████████████████████████████████████████████████████| 148/148 [00:03<00:00, 41.05it/s] at 0.48 GB/s

Mar 11 '25 13:03 Mr-lwd

Sorry to bother you, but I'd love to know the solution.

Mar 11 '25 13:03 Mr-lwd

I can successfully run exo on my devices (Jetson Orin Nano 8GB + Jetson Orin Nx 16GB)，however the inference speed is too slow (1~2 tokens per second for llama3.2:1b-8bit). But when I watch the "jtop", it seems that GPU is working at a high frequency.

In Ollama, even deepseek-r1(7b, 4.9GB, 4bit) can inference at the high speed(7tps in Jetson Orin Nano 8GB).

I'm not sure that Jetson devices can't run any 8-bit llm at high speed.

Mar 12 '25 08:03 Mr-lwd

I maybe figure out the reason why the speed is too slow. Jetson devices just support python3.10 and torch2.5~2.6, though they can create venv in python3.12, but the cuda is not available (torch.cuda.is_available() returns false).

Mar 12 '25 14:03 Mr-lwd

HI, I had install python 3.12 in my Jetson orin nx 16G, and install exo OK，but could get 0(zero) TFLOPS; another is mac mini m4 32G may had 8.9 TFLOPS; when i run exo at jetson and mac min, they can discover each other, but total had 8.9 TFLOPS; i had download deepseek r1-32b by ollama in my mac mini , but when i run with "curl --" not get any responsed, so , could you help me to run exo with the ollama deepseek r1-32 b?

Mar 17 '25 08:03 dongpan90

HI, I had install python 3.12 in my Jetson orin nx 16G, and install exo OK，but could get 0(zero) TFLOPS; another is mac mini m4 32G may had 8.9 TFLOPS; when i run exo at jetson and mac min, they can discover each other, but total had 8.9 TFLOPS; i had download deepseek r1-32b by ollama in my mac mini , but when i run with "curl --" not get any responsed, so , could you help me to run exo with the ollama deepseek r1-32 b?

Hello, it seems that the newest LLMs can only run on "MLX", but not in "tinygrad" for devices with cuda, so you can not run deepseek in jetson. Moreover, to solve "0 TFLOPS" problem, you must modify the configuration file /exo/topology/device_capabilities.py, CHIP_FLOPS = { "Jetson_Orin_Nano": DeviceFlops(fp32=1.28*TFLOPS, fp16=2.56*TFLOPS, int8=40*TFLOPS), "Jetson_Orin_NX": DeviceFlops(fp32=1.88*TFLOPS, fp16=3.76*TFLOPS, int8=100*TFLOPS),

and change the memory capabilities function in device_capabilities.py: ` async def linux_device_capabilities() -> DeviceCapabilities: import psutil from tinygrad import Device

if DEBUG >= 2: print(f"tinygrad {Device.DEFAULT=}") if Device.DEFAULT == "CUDA" or Device.DEFAULT == "NV" or Device.DEFAULT == "GPU": try: # print("Device.DEFAULT", Device.DEFAULT) import pynvml pynvml.nvmlInit() handle = pynvml.nvmlDeviceGetHandleByIndex(0) gpu_raw_name = pynvml.nvmlDeviceGetName(handle).upper() gpu_name = gpu_raw_name.rsplit(" ", 1)[0] if gpu_raw_name.endswith("GB") else gpu_raw_name gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle) pynvml.nvmlShutdown() except Exception as e: if DEBUG >= 2: print(f"pynvml failed: {e}") try: with open("/proc/device-tree/compatible") as f: compatible = f.read().lower() if "tegra194" in compatible: gpu_name = "XAVIER" elif "tegra210" in compatible: gpu_name = "TX1" elif "tegra186" in compatible: gpu_name = "TX2" elif "p3768-0000+p3767-0003" in compatible: gpu_name = "Jetson_Orin_Nano"
elif "p3768-0000+p3767-0000" in compatible: gpu_name = "Jetson_Orin_NX"
else: gpu_name = "JETSON_GPU"

        with open("/proc/meminfo") as f:
            for line in f:
                if "MemTotal" in line:
                    total_mem = int(line.split()[1]) * 1024
                    break
            else:
                total_mem = 0
        
        gpu_memory_info = type('',(object,),{"total": total_mem})()

`

By the way, I do not know the accurate TFLOPS of Jetson devices, and it seems that the TFLOPS configuration does not have any impact, but memory configuration is important because it is related to model sharding.

Mar 18 '25 07:03 Mr-lwd