download_file is corrupting the file if downloaded_size is not equal to total_size
I was getting errors like "Error processing prompt: Invalid control character at: line 563 column 62 (char 47769)" while running exo. While investigating the problem I came across a corrupted index.json file:
% cat /Volumes/Palladium/cache/huggingface/hub/models--mlx-community--DeepSeek-R1-Distill-Qwen-1.5B-3bit/snapshots/17131b28b8190d7b2aceb9b98926f263606a11ee/model.safetensors.index.json | tail -n 10 "model.layers.9.self_attn.q_proj.scales": "model.safetensors", "model.layers.9.self_attn.q_proj.weight": "model.safetensors", "model.layers.9.self_attn.v_proj.bias": "model.safetensors", "model.layers.9.self_attn.v_proj.biases": "model.safetensors", "model.layers.9.self_attn.v_proj.scales": "model.safetensors", "model.layers.9.self_attn.v_proj.weight": "model.safetensors", "model.norm.weight": "model.safetensors" } } Range not satisfiableRange not satisfiable%
With the help of ChatGPT (o1), I tracked it down to a bug in hf_helpers.py:
elif response.status == 416: # Range not satisfiable content_range = response.headers.get('Content-Range', '') try: total_size = int(content_range.split('/')[-1]) if downloaded_size == total_size: if DEBUG >= 2: print(f"File fully downloaded on first pass: {file_path}") ... return except ValueError: if DEBUG >= 1: print(f"Failed to parse Content-Range header: {content_range}. Starting download from scratch...") return await download_file(..., use_range_request=False)
As ChatGPT says, "After that code block, there’s no return or raise if downloaded_size != total_size. The function simply “falls through” to this part."
This looks like a bug to me. Here is the fix suggested by ChatGPT o1 (after a few suggestions I made):
diff --git a/exo/download/hf/hf_helpers.py b/exo/download/hf/hf_helpers.py
index 8be12b9..ff3156c 100644
--- a/exo/download/hf/hf_helpers.py
+++ b/exo/download/hf/hf_helpers.py
@@ -202,6 +202,29 @@ async def download_file(
if progress_callback:
await progress_callback(RepoFileProgressEvent(repo_id, revision, file_path, downloaded_size, downloaded_this_session, total_size, 0, timedelta(0), "complete"))
return
+ if downloaded_size > total_size:
+ # Local file is bigger than remote file => definitely corrupted
+ if DEBUG >= 1:
+ print(
+ f"Local file size ({downloaded_size}) is greater than remote file size ({total_size}). "
+ "Removing local file and restarting download from scratch..."
+ )
+ # Remove the local file and redownload entirely
+ await aios.remove(local_path)
+ return await download_file(
+ session, repo_id, revision, file_path, save_directory, progress_callback, use_range_request=False
+ )
+
+ if downloaded_size < total_size:
+ # We haven't yet downloaded the full file, so the requested range is invalid
+ if DEBUG >= 1:
+ print(
+ f"Partial file mismatch: local={downloaded_size}, total={total_size}. "
+ "Retrying from scratch without range requests..."
+ )
+ return await download_file(
+ session, repo_id, revision, file_path, save_directory, progress_callback, use_range_request=False
+ )
except ValueError:
if DEBUG >= 1: print(f"Failed to parse Content-Range header: {content_range}. Starting download from scratch...")
return await download_file(session, repo_id, revision, file_path, save_directory, progress_callback, use_range_request=False)
Seems to work. Let me know.
Should be fixed with https://github.com/exo-explore/exo/pull/640 Let me know if it's still an issue.
Should be fixed with #640 Let me know if it's still an issue.
Is it possible that it is no longer respecting HF_HOME? It seems to be downloading everything to ~/.cache/exo/downloads/, even though I have set a different folder for HF_HOME.
EDIT: Nevermind, I see that HF_HOME has changed to EXO_HOME.
EDIT 2: I see you also changed from hub/ to downloads/. I wonder if this means that I shouldn't be trying to share my huggingface directory between exo and other programs that use huggingface or if I just need to set the env variables up in order to do this. (I'd prefer not to use a symbolic link but that would be another alternative.)
EDIT 3: Okay looks like the directory format has completely changed. Also I'm getting weird behavior if I have HF_HOME and EXO_HOME both set. So I'm kind of confused as to how to transition everything without re-downloading all the models.
Fixed in 1.0.