Diff-Harmonization
Diff-Harmonization copied to clipboard
Error running the inference command
Whenever I run the command to harmonize multiple images, I get this error:
OSError: Can't load config for 'stabilityai/stable-diffusion-2-base'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'stabilityai/stable-diffusion-2-base' is the correct path to a directory containing a model_index.json file
And when I try logging in to hugging face using huggingface -cli login, I get this: requests.exceptions.SSLError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/whoami-v2 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: EE certificate key too weak (_ssl.c:1007)')))"), '(Request ID: eb7b7154-574c-4480-a29a-89ea3d9e238e)')
Please help me solve the issue.
Hi @raniasyed ,
I didn't encounter such an issue before, but maybe you can have a look at this issue https://github.com/huggingface/transformers/issues/17611 which is quite similar.
Besides, there could also sometimes be internet problem. You can also have a try to add this script at the start of the code (refer to https://github.com/CompVis/stable-diffusion/issues/302#issuecomment-2042642642):
import os
os.environ['HF_ENDPOINT']='https://hf-mirror.com'
Hey, thank you for helping me out with that @WindVChen ! I just have one more question. The inference script takes a lot of time even for a single image. If I need to reduce the runtime, do you think removing text embedding optimization or commenting out UNET and passing the background image directly will work without affecting the code that much? Also can you point where you used these in the code? I couldn't quite grasp it since I am a beginner.
Hi @raniasyed ,
The time cost mainly comes from multiple iterations of text embeddings optimization (code refer to here), null-text embeddings optimization (code refer to here), and the multiple rounds of the above operations (code refer to here).
Based on the visualizations of the ablation study in Fig. 11 (paper v2), "removing text embedding optimization" will deteriorate the performance. As for "commenting out UNET", I'm not quite sure what it means. Do you mean removing the UNET in the diffusion model structure? And could you give more explanations about "passing the background image directly"?
A straightforward solution to reduce time cost is Earlystopping, when you have got satisfactory results after a harmonization round. Another possible way is to replace the DDIM scheduler (50 steps) with other faster schedulers like DPMSolver (about 20 steps). To achieve that, you may also need to modify the hyper parameters like learning rates in the script for good adaptation.