hf-hub icon indicating copy to clipboard operation
hf-hub copied to clipboard

Sync ApiRepo.get("file") panics instead of returning Err(_) when failing to acquire lock for downloading file in multi-threaded context.

Open august99us opened this issue 7 months ago • 1 comments

Hey, just starting in open source so please bear with me,

I noticed when using the sync API, having multiple threads doing the same work will crash all but one thread when trying to retrieve model files on the first application run. Looking into it further, it looks like this is because the sync API calls .unwrap() directly on the lock_file() call here (which can possibly return ApiError::LockAcquisition): https://github.com/huggingface/hf-hub/blob/main/src/api/sync.rs#L761

The tokio version of the api propagates this error upwards instead: (? instead of .unwrap()) https://github.com/huggingface/hf-hub/blob/main/src/api/tokio.rs#L880

Steps to reproduce:

use std::{fs, path::PathBuf, thread};

use hf_hub::{api::sync::{ApiBuilder, ApiError}, Cache};

fn main() {
    let testing_dir = "test-lock-panic-5a583aea-3aa0-4598-9b37-64477853e185";
    fs::create_dir(&testing_dir).unwrap();
    
    let mut handles: Vec<thread::JoinHandle<()>> = vec![];
    for _i in 0..5 {
        let testing_dir_clone = testing_dir.to_string();
        handles.push(thread::spawn(move || {
            let cache = Cache::new(PathBuf::from(testing_dir_clone));
            let api = ApiBuilder::from_cache(cache).build().unwrap();
            let api_repo = api.repo(hf_hub::Repo::new(
                "august99us/testing-model".to_string(),
                hf_hub::RepoType::Model,
            ));
            let model_file = api_repo.get("large_file.dat");

            assert!(model_file.is_ok() || model_file.is_err_and(|e| matches!(e, ApiError::LockAcquisition(_))));
        }));
    }
    
    let result = handles.into_iter().map(|h| h.join()).collect::<Result<(), _>>();

    fs::remove_dir_all(&testing_dir).unwrap();

    assert!(result.is_ok());
    println!("Yay! Nobody panicked!");
}

Will panic if any of the threads panic.

Proposing to also propagate the LockAcquisition error upwards in the sync API.

august99us avatar May 10 '25 01:05 august99us

https://github.com/huggingface/hf-hub/pull/106 I made a pull request for this.

august99us avatar May 10 '25 01:05 august99us

Thanks a lot of the PR!

This was indeed an oversight from me, thanks for catching it.

Narsil avatar Jun 16 '25 08:06 Narsil