diffusers
diffusers copied to clipboard
new from_single_file implementation is always internet-first and using local files only on timeout
Describe the bug
new from_single_file implementation is always internet-first and using local config files only after timeout even if they are already present locally as they were previously downloaded.
code flow is from_single_file -> _download_diffusers_model_config_from_hub -> snapshot_download
and snapshot_download function in huggingface_hub is always checking online first and only on timeout fetching already existing local files first
so to avoid triggering snapshot_download, existing check
if not os.path.isdir(default_pretrained_model_config_name) inside from_single_file
needs to be expanded to cover already downloaded files first
you can verify by looking at simple message in console such as:
Fetching 11 files
that message is valid on first use/download, but should not be triggered every single time model load is triggered.
and message only comes from hf_hub online downloader. and if you disconnect from network and try again, everything works just fine and model is loaded without that log message but only after connection timeout.
Reproduction
see above
Logs
No response
System Info
diffusers==0.28.0.dev
Who can help?
@yiyixuxu @DN6 @sayakpaul
hi @vladmandic
want to make sure that I understand the issue
I think with the current implementation this code below should work when the my_local_config_path is a folder. Do you want to make it work when my_local_config_path is a local file as well?
pipe = StableDiffusionXLPipeline.from_single_file(my_local_checkpoint_path, config=my_local_config_path, local_files_only=True)
https://huggingface.co/docs/diffusers/main/en/api/loaders/single_file#working-with-local-files
no, current behavior with folder is fine, the issue is that it doesn't match previously donwloaded and cached copy.
for example, current check os.path.isdir(default_pretrained_model_config_name) will only match folder with name stabilitya/stable-diffusion-xl-base-1.0 which doesn't exist (unless user created it manually)
and for everything else it will trigger snapshot_download which first checks online and only after online check returns cached copy.
so either:
huggingface_hubshould change behavior ofsnapshot_downloadto return cached copy BEFORE doing online check or- modify your check to include cached model, not just model name from
default_pretrained_model_config_name,
(for example~/.cache/huggingface/hub/models--stabilityai--stable-diffusion-xl-base-1.0)
and yes, this can be somewhat addressed by setting local_files_only=True, but that has other sideeffects - what if I actually want to download a model...
snapshot_download always first checks online first because otherwise, it would not be able to know if all the files are in the cached folder
(Here, it relies on the repo_info.sibling to filter out the list of files on the repo
https://github.com/huggingface/huggingface_hub/blob/5ff2d150d121d04799b78bc08f2343c21b8f07a9/src/huggingface_hub/_snapshot_download.py#L246)
If you pass local_file_only=True, it will just return the cached folder without checking and we do not guarantee that the folder contains the files. I think checking online first is a reasonable default behavior for snapshat_download, no? also it will not actually download the files again if it is already downloaded
and yes, this can be somewhat addressed by setting local_files_only=True, but that has other sideeffects - what if I actually want to download a model...
I think only use local_files_only if you know for sure that your local folder contains everything you need; otherwise, just use the default setting - sure it will check online first to see if the config files are there, but it is not a big deal, no?
alternatively, I think you can pass a config argument to skip the snapshot_download https://github.com/huggingface/diffusers/blob/be4afa0bb4384f201c8fe68af536faffefbae661/src/diffusers/loaders/single_file.py#L388
i'm constantly updating my models on the hub and this behaviour means that the latest model is always loaded for inference.
disabling the network for the user when they have that problem does seem like a solution? eg. have a way they can just go into "Offline Mode".
downloading config on first access is a good thing. issue is online checks for EVERY access.
passing config manually would be a workaround IF there was a way to know which config to pass. but config is only determined after initial load and then by calling fetch_diffusers_config(checkpoint).
note that intention is good with allowing local_files_only=True as there is a fallback except LocalEntryNotFoundError, but its not working as expected.
UPDATE: fallback seems to be working if original_config=None. if original_config is set, then its completely broken.
downloading config on first access is a good thing. issue is online checks for EVERY access.
but how would we know if the hub isn't updated if we don't check on every access?
@vladmandic
UPDATE: fallback seems to be working if original_config=None. if original_config is set, then its completely broken.
what do you mean by completely broken? can you provide a script? this works for me if original_config is a local file, if it is a URL, we throw an error because it is not a local file
from diffusers import StableDiffusionXLPipeline
ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors"
repo_id = "stabilityai/stable-diffusion-xl-base-1.0"
#original_config = "https://raw.githubusercontent.com/Stability-AI/generative-models/main/configs/inference/sd_xl_base.yaml"
original_config = "sd_xl_base.yaml"
pipe = StableDiffusionXLPipeline.from_single_file(ckpt_path, original_config=original_config, local_files_only=True)
@vladmandic also note here https://huggingface.co/docs/diffusers/main/en/api/loaders/single_file#using-the-original-configuration-file-of-a-model
When using original_config with local_files_only=True, we do not rely on the config files to infer pipeline components (therefore will not try to download it)
but how would we know if the hub isn't updated if we don't check on every access?
lets look at it realistically for a min - for 99.99% of cases, we don't care if config has been updated or not as long as there is a config. how often do configs for base models get updated after they are initially uploaded? we're talking about configs for base model only here (e.g. stabilityai/stable-diffusion-xl-base-1.0), not every model that is of the same type or every actual model weights that may get updated.
and doing online-first does have a big impact. just for example, even huggingface.co is known to be slow at times, so should that cause every model load world wide to be slow as well? and not everyone has a stable and fast always-on link.
so if local files exist instead of "check always", it should be "check on demand only".
what do you mean by completely broken? can you provide a script? this works for me if original_config is a local file, if it is a URL, we throw an error because it is not a local file
i'll provide logs, its not working for me. but that's a separate topic and i'll create separate issue for it.
@vladmandic
for "check on-demand only" (when original_config = None) it seems like you will be able to get exactly just that with local_files_only=True , no?
So, what's left here is to figure out how to make it work this way when original_config is passed, right? yes it is a separate logic so feel free to open a new issue :) and some logs will be nice
correct on both notes :)
Hi @vladmandic could you please provide a code example I could use to reproduce/test the issue you're facing?
i've downloaded common configs locally and passing them using config works fine,
no point of chasing issues using legacy original_config so i'll close the issue.