Pedro Cuenca

Results 331 comments of Pedro Cuenca

I think the method is very well described here: https://huggingface.co/blog/assisted-generation, and there are some benchmarks with real-life gaining on different GPUs and tasks. TL;DR: it's helpful most times even when...

Oh, that's probably because scaled dot-product attention is enabled by default if torch 2 is in use. `pipe.unet.set_default_attn_processor()` should work. I can test and submit a PR in a few...

Hello @rovo79! Conversion works for me. - Would you mind sharing the exact conversion command you used, so we can try to reproduce? - Did you try with the stable...

Update: I could reproduce with PyTorch 2.1.0, which was released yesterday. In the meantime, I recommend you use PyTorch 2.0.1 to convert your model.

Another workaround is to add the following line after the pipeline has been loaded: ```py pipe.vae.set_default_attn_processor() ```

Reference: https://github.com/huggingface/diffusers/issues/3115. In addition, many ControlNet models already contain the `base_model` property (added manually or trained using the Flax script). See for example https://huggingface.co/lllyasviel/sd-controlnet-canny/blob/main/README.md

Looking into it. Thanks, I didn't realize the encoder inputs had changed.

Submitted these PRs after converting the VAE encoder again: https://huggingface.co/apple/coreml-stable-diffusion-v1-4/discussions/4/files https://huggingface.co/apple/coreml-stable-diffusion-v1-5/discussions/5/files https://huggingface.co/apple/coreml-stable-diffusion-2-base/discussions/6/files https://huggingface.co/apple/coreml-stable-diffusion-2-1-base/discussions/2/files Tested locally with this script: ```bash declare -a repos=( coreml-stable-diffusion-v1-4 coreml-stable-diffusion-v1-5 coreml-stable-diffusion-2-base coreml-stable-diffusion-2-1-base ) for repo in...

Thanks for the confirmation @keijiro! I just merged those PRs so I think this issue can be closed now :)

I followed the same path independently, and can confirm that bilinear instead of bicubic interpolation for the position encodings results in unnoticeable visual differences in the generated depth map.