hf_transfer icon indicating copy to clipboard operation
hf_transfer copied to clipboard

Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling

Open FurkanGozukara opened this issue 10 months ago • 25 comments

I am using this in my installers and working great for me

set HF_HUB_ENABLE_HF_TRANSFER=1

However some people having below error. Any way to fix it? or some auto fallback so it will still work?

model.ckpt:  11%|███████                                                           | 262M/2.47G [02:29<20:58, 1.75MB/s]
Traceback (most recent call last):
  File "C:\Users\ShinOnii\Downloads\GigaGAN_Upscaler_v4\venv\lib\site-packages\huggingface_hub\file_download.py", line 426, in http_get
    hf_transfer.download(
Exception: Error while removing corrupted file: The process cannot access the file because it is being used by another process. (os error 32)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\ShinOnii\Downloads\GigaGAN_Upscaler_v4\app.py", line 41, in 
    aura_sr = AuraSR.from_pretrained("fal/AuraSR-v2")
  File "C:\Users\ShinOnii\Downloads\GigaGAN_Upscaler_v4\venv\lib\site-packages\aura_sr.py", line 807, in from_pretrained
    hf_model_path = Path(snapshot_download(model_id))
  File "C:\Users\ShinOnii\Downloads\GigaGAN_Upscaler_v4\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\ShinOnii\Downloads\GigaGAN_Upscaler_v4\venv\lib\site-packages\huggingface_hub\_snapshot_download.py", line 294, in snapshot_download
    _inner_hf_hub_download(file)
  File "C:\Users\ShinOnii\Downloads\GigaGAN_Upscaler_v4\venv\lib\site-packages\huggingface_hub\_snapshot_download.py", line 270, in _inner_hf_hub_download
    return hf_hub_download(
  File "C:\Users\ShinOnii\Downloads\GigaGAN_Upscaler_v4\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\ShinOnii\Downloads\GigaGAN_Upscaler_v4\venv\lib\site-packages\huggingface_hub\file_download.py", line 860, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "C:\Users\ShinOnii\Downloads\GigaGAN_Upscaler_v4\venv\lib\site-packages\huggingface_hub\file_download.py", line 1009, in _hf_hub_download_to_cache_dir
    _download_to_tmp_and_move(
  File "C:\Users\ShinOnii\Downloads\GigaGAN_Upscaler_v4\venv\lib\site-packages\huggingface_hub\file_download.py", line 1543, in _download_to_tmp_and_move
    http_get(
  File "C:\Users\ShinOnii\Downloads\GigaGAN_Upscaler_v4\venv\lib\site-packages\huggingface_hub\file_download.py", line 437, in http_get
    raise RuntimeError(
RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
Press any key to continue . . .

FurkanGozukara avatar Feb 05 '25 16:02 FurkanGozukara

Furkan my friend, did you find any solution to this?

kishudoshi01 avatar Feb 22 '25 11:02 kishudoshi01

Furkan my friend, did you find any solution to this?

sadly not. it never happened to me but some of my followers having it

FurkanGozukara avatar Feb 22 '25 11:02 FurkanGozukara

Furkan my friend, did you find any solution to this?

sadly not. it never happened to me but some of my followers having it

damn.. i am trying to integrate this to my tools but saw this error

kishudoshi01 avatar Feb 22 '25 11:02 kishudoshi01

Furkan my friend, did you find any solution to this?

sadly not. it never happened to me but some of my followers having it

damn.. i am trying to integrate this to my tools but saw this error

yes annoying sadly. i am also being have to explain them to disable - and they are not really that professional

i wish huggingface team had implemented an automated fall back

FurkanGozukara avatar Feb 22 '25 11:02 FurkanGozukara

Furkan my friend, did you find any solution to this?

sadly not. it never happened to me but some of my followers having it

damn.. i am trying to integrate this to my tools but saw this error

yes annoying sadly. i am also being have to explain them to disable - and they are not really that professional

i wish huggingface team had implemented an automated fall back

wait, where can i put the code to disable it?

kishudoshi01 avatar Feb 22 '25 12:02 kishudoshi01

Furkan my friend, did you find any solution to this?

sadly not. it never happened to me but some of my followers having it

damn.. i am trying to integrate this to my tools but saw this error

yes annoying sadly. i am also being have to explain them to disable - and they are not really that professional i wish huggingface team had implemented an automated fall back

wait, where can i put the code to disable it?

set HF_HUB_ENABLE_HF_TRANSFER=0

FurkanGozukara avatar Feb 22 '25 12:02 FurkanGozukara

Furkan my friend, did you find any solution to this?

sadly not. it never happened to me but some of my followers having it

damn.. i am trying to integrate this to my tools but saw this error

yes annoying sadly. i am also being have to explain them to disable - and they are not really that professional i wish huggingface team had implemented an automated fall back

wait, where can i put the code to disable it?

set HF_HUB_ENABLE_HF_TRANSFER=0

i added it to my app.py but it didnt work, where should i put it?

kishudoshi01 avatar Feb 22 '25 12:02 kishudoshi01

Furkan my friend, did you find any solution to this?

sadly not. it never happened to me but some of my followers having it

damn.. i am trying to integrate this to my tools but saw this error

yes annoying sadly. i am also being have to explain them to disable - and they are not really that professional i wish huggingface team had implemented an automated fall back

wait, where can i put the code to disable it?

set HF_HUB_ENABLE_HF_TRANSFER=0

i added it to my app.py but it didnt work, where should i put it?

before importing huggingface_hub and hf_transfer as far as i know

FurkanGozukara avatar Feb 22 '25 12:02 FurkanGozukara

Furkan my friend, did you find any solution to this?

sadly not. it never happened to me but some of my followers having it

damn.. i am trying to integrate this to my tools but saw this error

yes annoying sadly. i am also being have to explain them to disable - and they are not really that professional i wish huggingface team had implemented an automated fall back

wait, where can i put the code to disable it?

set HF_HUB_ENABLE_HF_TRANSFER=0

i added it to my app.py but it didnt work, where should i put it?

before importing huggingface_hub and hf_transfer as far as i know

So after some research i found out that multiple requests are being made every second to download models from HF servers from all over the world hence this option can't be handled perfectly by huggingface. So far this is something i conclude for now for this error.

kishudoshi01 avatar Feb 26 '25 11:02 kishudoshi01

I moved the issue it wasn't on the proper repo.

yes annoying sadly. i am also being have to explain them to disable - and they are not really that professional

i wish huggingface team had implemented an automated fall back

This library boldly says that it's not meant for general purpose and that it's there AS-IS. The reason is quite simple, it abuses the network and the host provider by multiplexing downloads like crazy. On most desktops, it's actually detrimental, because it will hose the small network, preventing the OS from doing a good job at its network.

The performance benefit mostly comes in large machines with many cores and super large network (read >500MB/s).

The fact that it causes issues on some systems is totally understandable. There is no way to "protect" or recover sanely. Usually the best bet is to simply NOT activate this by default, and let users activate it on their own.

Narsil avatar Feb 26 '25 11:02 Narsil

I moved the issue it wasn't on the proper repo.

yes annoying sadly. i am also being have to explain them to disable - and they are not really that professional i wish huggingface team had implemented an automated fall back

This library boldly says that it's not meant for general purpose and that it's there AS-IS. The reason is quite simple, it abuses the network and the host provider by multiplexing downloads like crazy. On most desktops, it's actually detrimental, because it will hose the small network, preventing the OS from doing a good job at its network.

The performance benefit mostly comes in large machines with many cores and super large network (read >500MB/s).

The fact that it causes issues on some systems is totally understandable. There is no way to "protect" or recover sanely. Usually the best bet is to simply NOT activate this by default, and let users activate it on their own.

Makes sense.

kishudoshi01 avatar Feb 26 '25 11:02 kishudoshi01

I moved the issue it wasn't on the proper repo.

yes annoying sadly. i am also being have to explain them to disable - and they are not really that professional i wish huggingface team had implemented an automated fall back

This library boldly says that it's not meant for general purpose and that it's there AS-IS. The reason is quite simple, it abuses the network and the host provider by multiplexing downloads like crazy. On most desktops, it's actually detrimental, because it will hose the small network, preventing the OS from doing a good job at its network.

The performance benefit mostly comes in large machines with many cores and super large network (read >500MB/s).

The fact that it causes issues on some systems is totally understandable. There is no way to "protect" or recover sanely. Usually the best bet is to simply NOT activate this by default, and let users activate it on their own.

Narsil i disagree. currently each connection is limited to 40 megabytes per second and you rarely reach it

with this i even saw 1 gigabytes per second on cloud

also many people nowadays have way better connection speeds like 1 gigabit

and especially in poor networks, this extremely speed ups - i saw a user was like 2 megabytes per second and with this over 15 megabytes per second

in some cases this doesnt work and i still couldnt figure out pattern - i have lots of users most of them just works fine

FurkanGozukara avatar Feb 26 '25 11:02 FurkanGozukara

and especially in poor networks, this extremely speed ups - i saw a user was like 2 megabytes per second and with this over 15 megabytes per second

This is news to me. I'm not sure how that can be, but I trust you. That changes my statement then. If this library helps in poor network situations then I can look into it. Do you happen to know if I can somehow reproduce this ? (OS, machine, network ?) Edit: Maybe the fix won't be to fix this library, but just make it faster on desktop like machine.

in some cases this doesnt work and i still couldnt figure out pattern

Yes that's the issue, and the one I'm trying to avoid by not making this tool too widely available. Knowing and supporting all the constraints will most certainly lead to adding more overhead and reduce the speed which is what this library is all about.

Narsil avatar Feb 26 '25 11:02 Narsil

and especially in poor networks, this extremely speed ups - i saw a user was like 2 megabytes per second and with this over 15 megabytes per second

This is news to me. I'm not sure how that can be, but I trust you. That changes my statement then. If this library helps in poor network situations then I can look into it. Do you happen to know if I can somehow reproduce this ? (OS, machine, network ?) Edit: Maybe the fix won't be to fix this library, but just make it faster on desktop like machine.

in some cases this doesnt work and i still couldnt figure out pattern

Yes that's the issue, and the one I'm trying to avoid by not making this tool too widely available. Knowing and supporting all the constraints will most certainly lead to adding more overhead and reduce the speed which is what this library is all about.

this poor network increment is expected. it is making impact of uGet. like opening more connection and being able to use more network. especially useful in poor connection countries or places.

I wish i could reproduce and give you more info. but so far i couldnt on my pc or on any cloud machines i test.

however if someone reports i tell them to remove set HF_HUB_ENABLE_HF_TRANSFER=1 and it starts working

if i can reproduce or can obtain more information i will let you know

anything i need to pay attention if i can get more info? like what info you would need?

FurkanGozukara avatar Feb 26 '25 13:02 FurkanGozukara

As much information as possible.

  • Location and all network information (WiFi, cable, mobile, ideally provider).
  • OS (as much details as possible, Windows, Linux, WSL)
  • Every version of the stack (Python version, hf_transfer version etc).
  • The faulty code.
  • The disk that is written on (SSD, HDD, network mounted..)

Ideally anything that's reproduceable. As I said the issue could come from many places.

this poor network increment is expected. it is making impact of uGet. like opening more connection and being able to use more network. especially useful in poor connection countries or places.

The only time this is normal is when providers artificially throttle individual connections. This is expected in most scenarios to drive fair traffic among various peers. However limiting to ~10MB/s in today's world feels like an issue.

Multiplexing might be hitting different machines and therefore effectively bypassing the throttling. But by doing that you might be actively harming the provider which would cut the connection, stall or even die, which could lead to the errors we're seeing. If the errors is incurred by the code to the provider, you see how finding a solution might be hard.

Narsil avatar Feb 26 '25 14:02 Narsil

As much information as possible.

  • Location and all network information (WiFi, cable, mobile, ideally provider).
  • OS (as much details as possible, Windows, Linux, WSL)
  • Every version of the stack (Python version, hf_transfer version etc).
  • The faulty code.
  • The disk that is written on (SSD, HDD, network mounted..)

a user just got error

i asked him

Location and all network information (WiFi, cable, mobile, ideally provider). The disk that is written on (SSD, HDD, network mounted..)

the other parameters are like this

Python 3.10.11 - Windows VENV

Faulty code

        snapshot_download(
            repo_id="Wan-AI/Wan2.1-I2V-14B-720P",
            local_dir=f"Wan2.1/models/Wan-AI/Wan2.1-I2V-14B-720P"
        )

Error

(…)pytorch_model-00001-of-00007.safetensors:   9%|██▉                              | 870M/9.85G [03:15<33:32, 4.46MB/s]
Traceback (most recent call last):
  File "C:\Users\pbour\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\file_download.py", line 498, in http_get
    hf_transfer.download(
Exception: Error while removing corrupted file: The process cannot access the file because it is being used by another process. (os error 32)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\wan_gen\Download_Models.py", line 51, in 
    download_model(sys.argv[1])
  File "C:\wan_gen\Download_Models.py", line 19, in download_model
    snapshot_download(
  File "C:\Users\pbour\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\pbour\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\_snapshot_download.py", line 306, in snapshot_download
    _inner_hf_hub_download(file)
  File "C:\Users\pbour\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\_snapshot_download.py", line 283, in _inner_hf_hub_download
    return hf_hub_download(
  File "C:\Users\pbour\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\pbour\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\file_download.py", line 1457, in hf_hub_download
    http_get(
  File "C:\Users\pbour\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\file_download.py", line 509, in http_get
    raise RuntimeError(
RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
Press any key to continue . . .

Pip Freeze

accelerate==1.4.0
aiofiles==23.2.1
aiohappyeyeballs==2.4.6
aiohttp==3.11.13
aiosignal==1.3.2
annotated-types==0.7.0
anyio==4.8.0
asttokens==3.0.0
async-timeout==5.0.1
attrs==25.1.0
certifi==2025.1.31
charset-normalizer==3.4.1
click==8.1.8
colorama==0.4.6
comm==0.2.2
controlnet-aux==0.0.7
cupy-cuda12x==13.3.0
dashscope==1.22.1
decorator==5.2.1
deepspeed @ https://files.pythonhosted.org/packages/3d/65/1a6394f5d6dee851e9ea59e385f6d6428e3bfe36f83c06e0336e14dcfd11/deepspeed-0.16.4-cp310-cp310-win_amd64.whl
-e git+https://github.com/modelscope/DiffSynth-Studio@1d309a14a3baee75ebaff0a5fb7d41044e0bec61#egg=diffsynth
diffusers==0.32.2
easydict==1.13
einops==0.8.1
exceptiongroup==1.2.2
executing==2.2.0
fastapi==0.115.8
fastrlock==0.8.3
ffmpy==0.5.0
filelock==3.13.1
flash_attn @ https://github.com/kingbri1/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu124torch2.5.1cxx11abiFALSE-cp310-cp310-win_amd64.whl
frozenlist==1.5.0
fsspec==2024.6.1
ftfy==6.3.1
gradio==5.18.0
gradio_client==1.7.2
h11==0.14.0
hf_transfer==0.1.9
hjson==3.1.0
httpcore==1.0.7
httpx==0.28.1
huggingface-hub==0.29.1
idna==3.10
imageio==2.37.0
imageio-ffmpeg==0.6.0
importlib_metadata==8.6.1
ipython==8.32.0
ipywidgets==8.1.5
jedi==0.19.2
Jinja2==3.1.3
jupyterlab_widgets==3.0.13
lazy_loader==0.4
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib-inline==0.1.7
mdurl==0.1.2
modelscope==1.23.1
moviepy==2.1.2
mpmath==1.3.0
msgpack==1.1.0
multidict==6.1.0
networkx==3.3
ninja==1.11.1.3
numpy==2.2.3
nvidia-ml-py==12.570.86
opencv-python==4.11.0.86
orjson==3.10.15
packaging==24.2
pandas==2.2.3
parso==0.8.4
pillow==10.4.0
proglog==0.1.10
prompt_toolkit==3.0.50
propcache==0.3.0
protobuf==5.29.3
psutil==7.0.0
pure_eval==0.2.3
py-cpuinfo==9.0.0
pydantic==2.10.6
pydantic_core==2.27.2
pydub==0.25.1
Pygments==2.19.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.20
pytz==2025.1
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
rich==13.9.4
ruff==0.9.7
safehttpx==0.1.6
safetensors==0.5.3
scikit-image==0.25.2
scipy==1.15.2
semantic-version==2.10.0
sentencepiece==0.2.0
shellingham==1.5.4
six==1.17.0
sniffio==1.3.1
stack-data==0.6.3
starlette==0.45.3
sympy==1.13.1
tifffile==2025.2.18
timm==1.0.15
tokenizers==0.20.3
tomlkit==0.13.2
torch==2.5.1+cu124
torchao==0.8.0
torchvision==0.20.1+cu124
tqdm==4.67.1
traitlets==5.14.3
transformers==4.46.2
triton @ https://github.com/woct0rdho/triton-windows/releases/download/v3.2.0-windows.post10/triton-3.2.0-cp310-cp310-win_amd64.whl
typer==0.15.1
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
uvicorn==0.34.0
wcwidth==0.2.13
websocket-client==1.8.0
websockets==15.0
widgetsnbextension==4.0.13
xformers==0.0.28.post3
yarl==1.18.3
zipp==3.21.0

FurkanGozukara avatar Feb 26 '25 15:02 FurkanGozukara

Could you try with pip install git+https://github.com/huggingface/hf_transfer@better_error_windows#egg=hf_transfer
This should add the original error (the currently shown error is because of file locking in Windows, and is not the original error message).

Narsil avatar Mar 05 '25 10:03 Narsil

pip install git+https://github.com/huggingface/hf_transfer@better_error_windows#egg=hf_transfer

thanks whenever someone reports again i will make them try this hopefully

FurkanGozukara avatar Mar 05 '25 11:03 FurkanGozukara

Could you try with pip install git+https://github.com/huggingface/hf_transfer@better_error_windows#egg=hf_transfer This should add the original error (the currently shown error is because of file locking in Windows, and is not the original error message).

ok someone having this problem but installing this pip command fails

so he was gonna test it

he get the same error as me as below

C:\Users\Furkan>pip install git+https://github.com/huggingface/hf_transfer@better_error_windows#egg=hf_transfer
Collecting hf_transfer
  Cloning https://github.com/huggingface/hf_transfer (to revision better_error_windows) to c:\users\furkan\appdata\local\temp\pip-install-90hiyfud\hf-transfer_c11b217eb6334010babb9fd4b2f99f70
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/hf_transfer 'C:\Users\Furkan\AppData\Local\Temp\pip-install-90hiyfud\hf-transfer_c11b217eb6334010babb9fd4b2f99f70'
  Running command git checkout -b better_error_windows --track origin/better_error_windows
  branch 'better_error_windows' set up to track 'origin/better_error_windows'.
  Switched to a new branch 'better_error_windows'
  Resolved https://github.com/huggingface/hf_transfer to commit d330fc97c13e7b64e1b0bffb93820aa89ce72740
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]

      Cargo, the Rust package manager, is not installed or is not on PATH.
      This package requires Rust and Cargo to compile extensions. Install it through
      the system's package manager or via https://rustup.rs/

      Checking for Rust toolchain....
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

FurkanGozukara avatar Mar 06 '25 15:03 FurkanGozukara

I encountered an error while trying to install hf_transfer using the command: pip install git+https://github.com/huggingface/hf_transfer@better_error_windows#egg=hf_transfer The error indicates that Rust and Cargo are required for compilation: Cargo, the Rust package manager, is not installed or is not on PATH.
This package requires Rust and Cargo to compile extensions. I’ve already updated pip to the latest version, but the issue persists.

Dlexer-byte avatar Mar 06 '25 15:03 Dlexer-byte

As one of Doc's followers who's affected by this:

Windows 11 Pro 14900K + 64gb ddr5 (6800) HFC on an nvme for speed 1GB (900 meg) down ISP. (40 up)

Also doing it on a system with the following specs:

core ultra 7 265k Server 2025/ server 2022/windows 11 (I've tried it on all three on this machine) it had 128gb ram, but now is down to 48 due to memory issues running 4 dimms hfc on an nvme too same isp connection.

Both machines are on 2.5gb sub-net with either 2.5 gb cards or 5gb cards

NEtwork bandwidth should be fine for a single file download.

The transfer with this enabled gets so far, (it varies), and stops dead. sits, then crashes out with the original error.

To download models and such, I have to disable this feature, but then, it downloads EVERY model at one time. This returns 'READ ERROR retrying download' many times before it'll finally fail the script and break out of it. I then have to re-start the script and it goes through this cycle again. Each run of the script gets more and more of the models, but I worry it introduces corruption and it also is a pain, especially on something like WAN 2.1 with 128gb of files to download.

Feel free to ask if I didn't include something you want to know about this.

Weatherby43 avatar Mar 30 '25 18:03 Weatherby43

I can confirm the issue, I've even faced the abuse report from my provider because hf hub made >100 requests per second and then trows RuntimeError. It looks like it retries with no timeout.

thaikoh avatar Apr 18 '25 09:04 thaikoh

The error message is totally rubbish on windows currently.

On windows you cannot delete a file that is still being opened, so the error would very likely to be Error while removing corrupted file since tokio is still occupying the file.

The real cause is downloading error, not file operation error, but users would never know that since the real cause would not be returned.

Image

I believe combing these two error messages would be a good choice if file operation failed.

wkgcass avatar May 25 '25 19:05 wkgcass

Using wsl ubuntu and this worked for me. Gemini gave this answer :) bash export HF_HUB_ENABLE_HF_TRANSFER=0 ilab model download

TLDR; It looks like you're encountering an error while trying to download the instructlab/granite-7b-lab-GGUF@main model using ilab model download. The key part of the error message is:

"Downloading failed with the following exception: Downloading model failed with the following error: Downloading model failed with the following Hugging Face Hub error: An error occurred while downloading using hf_transfer. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling."

This indicates that the download process, which seems to be leveraging hf_transfer (a faster transfer utility for Hugging Face Hub), encountered an issue.

Here's how you can approach debugging and resolving this:

  1. Disable hf_transfer (Recommended First Step): The error message explicitly suggests disabling hf_transfer. This is often the quickest fix if there's an incompatibility or a temporary issue with the hf_transfer utility itself. You can do this by setting an environment variable before running the ilab model download command:

    export HF_HUB_ENABLE_HF_TRANSFER=0
    ilab model download
    

    After running this, the download should proceed using the default Hugging Face Hub download mechanism, which is usually more robust for general cases.

  2. Check Network Connectivity: Even though the error points to hf_transfer, fundamental network issues can cause any download to fail.

    • Ensure you have a stable internet connection.
    • If you're behind a corporate proxy or firewall, ensure it's configured correctly and not blocking connections to huggingface.co. You might need to set proxy environment variables (HTTP_PROXY, HTTPS_PROXY).
  3. Check Disk Space: The model size is 4.08G. Make sure you have enough free disk space in the /home/mehmet/.cache/instructlab/models directory (or the drive where your home directory resides) to accommodate the model.

    You can check disk space with:

    df -h /home/mehmet/.cache/instructlab/models
    
  4. Inspect the Download Path:

    • The download is attempting to write to /home/mehmet/.cache/instructlab/models/.cache/huggingface/download/cBHY6fAjnFSRbyeHvtiPqRVG_SE=.6adeaad8c048b35ea54562c55e454cc32c63118a32c7b8152cf706b290611487.incomplete.
    • Check if mehmet has write permissions to the /home/mehmet/.cache/instructlab/models directory.
    • Sometimes, corrupted incomplete files can cause issues. You could try deleting the .incomplete file and then restarting the download (after disabling hf_transfer).
  5. Clear Hugging Face Cache (if issues persist): If you've had previous failed downloads or corrupted cache entries, clearing the Hugging Face cache might help. Be cautious: This will delete all downloaded models and datasets cached by Hugging Face Hub.

    rm -rf ~/.cache/huggingface/
    

    Then, try the ilab model download command again (preferably with HF_HUB_ENABLE_HF_TRANSFER=0).

  6. Update instructlab and huggingface_hub: Ensure your instructlab and huggingface_hub libraries are up to date. This can resolve bugs or compatibility issues.

    pip install --upgrade instructlab huggingface_hub
    
  7. Try a Different Model (for testing): If none of the above works, it might be a specific issue with that particular model file on Hugging Face Hub, or a temporary issue with their servers. You could try downloading a smaller, different GGUF model directly using the huggingface_hub Python library to see if the issue is with ilab or huggingface_hub itself. (This is more of a diagnostic step).

In summary, start by trying to disable hf_transfer:

export HF_HUB_ENABLE_HF_TRANSFER=0
ilab model download

m2017atTR avatar Jun 02 '25 20:06 m2017atTR

Still getting this error

steveepreston avatar Oct 19 '25 18:10 steveepreston

Very aggravating error as it is silent and tends to die after several hours of downloading, totally losing all progress, without even a clue as to the error. I've lost several 20+GB downloads to this vibe-coded piece of trash. Yet wget or git clone is perfect. I think it is worse for people who use VPNs, and it doesn't matter if they are disabled, so it's probably the virtual interface. I'm not wasting any more time figuring it out.

docwild avatar Dec 07 '25 18:12 docwild

i coded a downloader app and using it everywhere now

it has 100% resume capability and uses 16 different connections and verifies SHA256

this was only way i could guarantee

FurkanGozukara avatar Dec 07 '25 22:12 FurkanGozukara

@docwild

This was not vibecoded, but maybe the big disclaimer should have been placed somewhere in the place that recommended you use this tool:

DISCLAIMER

This library is a power user tool, to go beyond ~500MB/s on very high bandwidth network, where Python cannot cap out the available bandwidth.

This is not meant to be a general usability tool. It purposefully lacks progressbars and comes generally as-is.

Please file issues only if there's an issue on the underlying downloaded file.

Narsil avatar Dec 08 '25 08:12 Narsil

and what's the excuse for it being silent in faillure?

docwild avatar Dec 08 '25 15:12 docwild

This is just not meant as a general tool used ubuiquitously. hf-xet should have replaced this tool for the most part and comes with more bells and whistle.

Somehow this library was over shared and caused many problems which are purely out of scope, as this was never intended to cover all cases, nor be fully fleshed out. It's goal was to saturate DL/UL speeds on cloud-like networks.

It started as an internal library, and was shared as-is because some other power users needed the DL speed. But the disclaimer exists to warn that it's not a polished product. The warning used to be even more explicit.

Narsil avatar Dec 09 '25 09:12 Narsil