exo icon indicating copy to clipboard operation
exo copied to clipboard

https://huggingface.co/TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R/resolve/main/model.safetensors.index.json is not a valid link

Open bfolkjr opened this issue 10 months ago β€’ 29 comments

Good morning. I'm a Linux Mint user and have installed exo. When exo starts, I get the following: File "/data/bt/exo/exo/download/new_shard_download.py", line 155, in _download_file assert r.status in [200, 206], f"Failed to download {path} from {url}: {r.status}" ^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Failed to download model.safetensors.index.json from https://huggingface.co/TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R/resolve/main/model.safetensors.index.json: 401 Download error on attempt 21/30 for repo_id='TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R' revision='main' path='model.safetensors.index.json' Going to the URL in a browser, yield the following result: "Repository not found"

Is this hardcoded in exo or is it some environment variable I need to change locally? Using grep, I show multiple binary files when searching for the URL, but nothing that I can modify. Any help would be appreciated. exo does in fact start and can see other nodes, but the chat never returns a response. Has anyone else experienced this? Thanks in advance...

bfolkjr avatar Feb 23 '25 16:02 bfolkjr

hi, i am facing the same error and i tried to open the URL manually ,but got page not found error.

kamidehlvi avatar Feb 23 '25 18:02 kamidehlvi

I hope the devs or other users can give us a hand soon. I love the idea of exo and want to use it.

bfolkjr avatar Feb 23 '25 18:02 bfolkjr

Does anyone know where this URL is being set? AssertionError: Failed to download model.safetensors.index.json from https://huggingface.co/TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R/resolve/main/model.safetensors.index.json: 401 I really want to get this tested and then demo it at work.
I have the following environment variables set also.
export HF_HOME=/data/exo_models export HF_TOKEN=my_token for hf export TRANSFORMERS_OFFLINE=1

bfolkjr avatar Feb 24 '25 13:02 bfolkjr

Does anyone know where this URL is being set? AssertionError: Failed to download model.safetensors.index.json from https://huggingface.co/TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R/resolve/main/model.safetensors.index.json: 401 I really want to get this tested and then demo it at work. I have the following environment variables set also. export HF_HOME=/data/exo_models export HF_TOKEN=my_token for hf export TRANSFORMERS_OFFLINE=1

I removed it from models.py (currently line 80)

markmcnaughton avatar Feb 24 '25 17:02 markmcnaughton

Does anyone know where this URL is being set? AssertionError: Failed to download model.safetensors.index.json from https://huggingface.co/TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R/resolve/main/model.safetensors.index.json: 401 I really want to get this tested and then demo it at work. I have the following environment variables set also. export HF_HOME=/data/exo_models export HF_TOKEN=my_token for hf export TRANSFORMERS_OFFLINE=1

I removed it from models.py (currently line 80)

Thanks Mark!. I am making a little more progress with exo now!

bfolkjr avatar Feb 24 '25 19:02 bfolkjr

I made a little progress, but now I am getting the following (when attempting to generate a poem for Mark): File "/data/bt3/exo/exo/download/new_shard_download.py", line 164, in _download_file raise Exception(f"Downloaded file {target_dir/path} has hash {final_hash} but remote hash is {remote_hash}") Exception: Downloaded file /tmp/exo/NousResearch--Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json has hash d842d2f984b5f2c0e3f4ed15f81b79b2ffe6f283 but remote hash is 37b1afe63cadc4ddce30aaff1b149c2f3083650c

"/data/bt3/exo/exo/download/" is not my path for downloaded models. My path is /data/exo_models. Where is that path coming from?

bfolkjr avatar Feb 24 '25 20:02 bfolkjr

I’m guessing you could edit your local file and change the hash to match the remote. Not sure if thats the solution or why this would be the case.

On Mon, 24 Feb 2025 at 22:36, bfolkjr @.***> wrote:

I made a little progress, but now I am getting the following (when attempting to generate a poem for Mark): File "/data/bt3/exo/exo/download/new_shard_download.py", line 164, in _download_file raise Exception(f"Downloaded file {target_dir/path} has hash {final_hash} but remote hash is {remote_hash}") Exception: Downloaded file /tmp/exo/NousResearch--Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json has hash d842d2f984b5f2c0e3f4ed15f81b79b2ffe6f283 but remote hash is 37b1afe63cadc4ddce30aaff1b149c2f3083650c

β€” Reply to this email directly, view it on GitHub https://github.com/exo-explore/exo/issues/728#issuecomment-2679587110, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE3YVNNKWTO6GWLUAICELD2RN7EPAVCNFSM6AAAAABXWN7O62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZZGU4DOMJRGA . You are receiving this because you commented.Message ID: @.***> [image: bfolkjr]bfolkjr left a comment (exo-explore/exo#728) https://github.com/exo-explore/exo/issues/728#issuecomment-2679587110

I made a little progress, but now I am getting the following (when attempting to generate a poem for Mark): File "/data/bt3/exo/exo/download/new_shard_download.py", line 164, in _download_file raise Exception(f"Downloaded file {target_dir/path} has hash {final_hash} but remote hash is {remote_hash}") Exception: Downloaded file /tmp/exo/NousResearch--Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json has hash d842d2f984b5f2c0e3f4ed15f81b79b2ffe6f283 but remote hash is 37b1afe63cadc4ddce30aaff1b149c2f3083650c

β€” Reply to this email directly, view it on GitHub https://github.com/exo-explore/exo/issues/728#issuecomment-2679587110, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE3YVNNKWTO6GWLUAICELD2RN7EPAVCNFSM6AAAAABXWN7O62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZZGU4DOMJRGA . You are receiving this because you commented.Message ID: @.***>

markmcnaughton avatar Feb 24 '25 21:02 markmcnaughton

Wait. What local file Mark?

bfolkjr avatar Feb 24 '25 21:02 bfolkjr

I think this, /tmp/exo/NousResearch--Meta-Llama-3.1-70B-Instruct/model. safetensors.index.json

On Mon, 24 Feb 2025 at 23:30, bfolkjr @.***> wrote:

Wait. What local file Mark?

β€” Reply to this email directly, view it on GitHub https://github.com/exo-explore/exo/issues/728#issuecomment-2679693135, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE3YVMLN6TY4UVHTJCOJVT2ROFOHAVCNFSM6AAAAABXWN7O62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZZGY4TGMJTGU . You are receiving this because you commented.Message ID: @.***> [image: bfolkjr]bfolkjr left a comment (exo-explore/exo#728) https://github.com/exo-explore/exo/issues/728#issuecomment-2679693135

Wait. What local file Mark?

β€” Reply to this email directly, view it on GitHub https://github.com/exo-explore/exo/issues/728#issuecomment-2679693135, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE3YVMLN6TY4UVHTJCOJVT2ROFOHAVCNFSM6AAAAABXWN7O62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZZGY4TGMJTGU . You are receiving this because you commented.Message ID: @.***>

markmcnaughton avatar Feb 24 '25 21:02 markmcnaughton

Thanks again Mark!

bfolkjr avatar Feb 24 '25 21:02 bfolkjr

Unfortunately, the file doesn't contain any hash info...

bfolkjr avatar Feb 24 '25 22:02 bfolkjr

hello,did you solve the issue?thanks

Fahad16301139 avatar Feb 25 '25 05:02 Fahad16301139

Hey Fahad, The issue with the hash is a show stopper for me. Have you made any progress?

bfolkjr avatar Feb 25 '25 12:02 bfolkjr

This has been the result of trying to generate a prompt: has_read=True, has_write=True 0%| | 0/148 [00:00<?, ?it/s] ram used: 0.00 GB, layers.0.attention.wq.weight : 1%|β–Œ | 1/148 [00:00<00:11, 12.26it/s] ram used: 0.01 GB, layers.0.attention.wk.weight : 1%|β–ˆβ– | 2/148 [00:00<00:08, 16.56it/s] ram used: 0.01 GB, layers.0.attention.wv.weight : 2%|β–ˆβ–‹ | 3/148 [00:00<00:06, 20.76it/s] ram used: 0.01 GB, layers.0.attention.wo.weight : 3%|β–ˆβ–ˆβ–Ž | 4/148 [00:00<00:06, 21.64it/s] ram used: 0.02 GB, layers.0.feed_forward.w1.weight : 3%|β–ˆβ–ˆβ–Š | 5/148 [00:00<00:08, 17.68it/s] ╭───────────────────────────────────────────────────────────────────────────────────── Exo Cluster (1 nod It stops and hangs. I deleted "exo" references from /tmp also and downloaded 3.2 1B again.

bfolkjr avatar Feb 25 '25 12:02 bfolkjr

hello, i am having the same issue like yours it stops and hangs at 3 percent. could you tell me where can i find the exo references in tmp folder? in linux there are no exo references in my tmp folder.thanks!

Fahad16301139 avatar Feb 25 '25 16:02 Fahad16301139

Hey. Mine did have exo instances in /top, but removing them didn't help at all. I'm still in the same situation.

On Tue, Feb 25, 2025, 11:45β€―AM Fahad Hossain @.***> wrote:

hello, i am having the same issue like yours it stops and hangs at 3 percent. could you tell me where can i find the exo references in tmp folder? in linux there are no exo references in my tmp folder.thanks!

β€” Reply to this email directly, view it on GitHub https://github.com/exo-explore/exo/issues/728#issuecomment-2682626328, or unsubscribe https://github.com/notifications/unsubscribe-auth/BPZXQ6FWSL27OSKX6IIQSHL2RSM33AVCNFSM6AAAAABXWN7O62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBSGYZDMMZSHA . You are receiving this because you authored the thread.Message ID: @.***> [image: Fahad16301139]Fahad16301139 left a comment (exo-explore/exo#728) https://github.com/exo-explore/exo/issues/728#issuecomment-2682626328

hello, i am having the same issue like yours it stops and hangs at 3 percent. could you tell me where can i find the exo references in tmp folder? in linux there are no exo references in my tmp folder.thanks!

β€” Reply to this email directly, view it on GitHub https://github.com/exo-explore/exo/issues/728#issuecomment-2682626328, or unsubscribe https://github.com/notifications/unsubscribe-auth/BPZXQ6FWSL27OSKX6IIQSHL2RSM33AVCNFSM6AAAAABXWN7O62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBSGYZDMMZSHA . You are receiving this because you authored the thread.Message ID: @.***>

bfolkjr avatar Feb 25 '25 16:02 bfolkjr

Is this a new bug? i was able to run it couple weeks ago on one node

Fahad16301139 avatar Feb 25 '25 16:02 Fahad16301139

Good question. This is my first time trying. It hasn't worked at all for me on one node or multiple.

On Tue, Feb 25, 2025, 11:57β€―AM Fahad Hossain @.***> wrote:

Is this a new bug? i was able to run it couple weeks ago on one node

β€” Reply to this email directly, view it on GitHub https://github.com/exo-explore/exo/issues/728#issuecomment-2682659710, or unsubscribe https://github.com/notifications/unsubscribe-auth/BPZXQ6F3BMWZ2OILPRU7WGD2RSOGFAVCNFSM6AAAAABXWN7O62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBSGY2TSNZRGA . You are receiving this because you authored the thread.Message ID: @.***> [image: Fahad16301139]Fahad16301139 left a comment (exo-explore/exo#728) https://github.com/exo-explore/exo/issues/728#issuecomment-2682659710

Is this a new bug? i was able to run it couple weeks ago on one node

β€” Reply to this email directly, view it on GitHub https://github.com/exo-explore/exo/issues/728#issuecomment-2682659710, or unsubscribe https://github.com/notifications/unsubscribe-auth/BPZXQ6F3BMWZ2OILPRU7WGD2RSOGFAVCNFSM6AAAAABXWN7O62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBSGY2TSNZRGA . You are receiving this because you authored the thread.Message ID: @.***>

bfolkjr avatar Feb 25 '25 16:02 bfolkjr

I also had to comment line 80 from models.py in order to continue loading exo.

borch84 avatar Feb 26 '25 02:02 borch84

Yeah. I did the same. I really wish I could have done a demo for work. Maybe soon.

bfolkjr avatar Feb 26 '25 12:02 bfolkjr

What device are you running on? The error in the original post is harmless.

AlexCheema avatar Feb 27 '25 11:02 AlexCheema

The 2 workstations I am using are very similar: Linux Mint 21.3 HP Z840 workstation 256GB of RAM

Detected system: Linux Inference engine name after selection: tinygrad Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: SingletonShardDownloader [60742] Chat interface started:

  • http://10.0.0.101:52415
  • http://127.0.0.1:52415 ChatGPT API endpoint served at:
  • http://10.0.0.101:52415/v1/chat/completions
  • http://127.0.0.1:52415/v1/chat/completions has_read=True, has_write=True

Image Image Image Image

bfolkjr avatar Feb 27 '25 13:02 bfolkjr

It hangs at that point in the last screenshot after a model is downloaded. I am only trying to get it to run on one node before adding more.

bfolkjr avatar Feb 27 '25 13:02 bfolkjr

192769 bfolk 0 Compute 0% 186MiB 2% 100% 1605MiB /home/bfolk/.pyenv/versions/3.12.9/bin/python3.12 /home/bfolk/.pyenv/versions/3.12.9/bin/exo

That is the exo process from nvtop

bfolkjr avatar Feb 27 '25 13:02 bfolkjr

I don't know why it keeps downloading 70B in a loop, it downloads every time I restart, and this model keeps spinning in webui, even though I only clicked its download button once

Image

eastjoe avatar Mar 03 '25 08:03 eastjoe

I'm pretty sure I get the same eastjoe. I wish this worked. I start a new position on the 17th and really want to test this.

bfolkjr avatar Mar 06 '25 18:03 bfolkjr

Please don't forget about me lol

bfolkjr avatar Mar 09 '25 02:03 bfolkjr

βœ… [FIX] How to prevent Exo from auto-downloading the default model (TriAiExperiments/LLaMA…)

If you’re running into annoying automatic downloads of a private model like:

TriAiExperiments/SFR-Iterative-DPO-LLaMA-3-70B-R

…and getting errors like 401 Unauthorized, here’s the definitive fix before installing Exo:

βΈ»

πŸ› οΈ Steps to fix the issue before running installation: 1. Edit model.py β€’ Go to exo/src/exo/model.py β€’ At line 80, there’s a default model hardcoded β€’ ❌ Delete the entire object defining the default model (not just the line) 2. Edit test file β€’ Go to exo/test/test_tokenizer.py β€’ You’ll find another reference to that same model β€’ ❌ Remove it entirely 3. βœ… Then you can install Exo normally:

pip install -e .

βΈ»

πŸŽ‰ Boom! Now Exo will no longer try to download the default private model on launch.

Enjoy hacking with Exo πŸ’»πŸš€

WhiteMordred avatar Apr 26 '25 14:04 WhiteMordred

Good question. This is my first time trying. It hasn't worked at all for me on one node or multiple. …

Any update? 😒

dengbuqi avatar Apr 29 '25 09:04 dengbuqi

Image

dexter74 avatar Aug 06 '25 23:08 dexter74