ml-stable-diffusion icon indicating copy to clipboard operation
ml-stable-diffusion copied to clipboard

Poor Image Generation

Open rovo79 opened this issue 2 years ago • 4 comments

When I run this, none of my test prompts come out looking anything at all like the demo images, they look terrible. Any ideas what might be going wrong? I may just have the wrong expectation. randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base

I'm running Stable Diffusion 2 Base on a MacMini M1 16GB. Python 3.10.0

Compiled: python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o models --model-version 'stabilityai/stable-diffusion-2-base'

python command: python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output -i models --model-version 'stabilityai/stable-diffusion-2-base'

rovo79 avatar Dec 20 '22 11:12 rovo79

I realize now that I had been completely off with my expectations based on the install readme. I thought the demo prompt was meant to produce an image similar to that of the Example Results section. It seems those images are created with more sophisticated prompting and assumed that others will just know that. I was using those Example Results as a gauge to to know if I got my local setup correctly.

It might be helpful to others working through the initial setup, to clarify that those are not the initial results you will get with a simple prompt of: "a photo of an astronaut riding a horse on mars" and a default seed 93. Screenshot 2022-12-20 at 7 39 25 AM

rovo79 avatar Dec 20 '22 12:12 rovo79

I guess the issue here is the absence of a way to append negative prompts to the input prompt. The v2 model in particular is known to be almost garbage without negative prompts. It seems there are some related PRs though. With v1-5 you should get nicer outputs for the time being.

fowtwtio avatar Dec 20 '22 15:12 fowtwtio

Well it definitely isn't "bad" now that I learned to change my expectation when using the prompt "a photo of an astronaut riding a horse on mars".

I generated these images using a longer description, and they show promise. I think the issue now might be that it still outputs to 512x512. I could have this totally wrong, but I think I read that SD 2 outputs to 768x768? a_photo_of_an_astronaut_riding_a_horse_on_mars,_digital_art,_trending_on_De 2 197097783 final a_photo_of_an_astronaut_riding_a_horse_on_mars,_digital_art,_trending_on_De 1 197097783 final a_photo_of_an_astronaut_riding_a_horse_on_mars,_digital_art,_trending_on_De 0 197097783 final

rovo79 avatar Dec 20 '22 15:12 rovo79

I could have this totally wrong, but I think I read that SD 2 outputs to 768x768?

As noted elsewhere https://github.com/apple/ml-stable-diffusion/issues/81#issuecomment-1364509035, the base model generates 512x512 images, whereas the "full" model generates 768x768. The current conversion code seems to be a bit limited and only supports a subset of what the original diffusers library can do. Another limitation is input shapes / image dimensions, as noted in https://github.com/apple/ml-stable-diffusion/issues/64.

Lower res images could be upscaled, e.g. with https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler, however that particular model cannot be converted right now since the converter code is incomplete (https://github.com/apple/ml-stable-diffusion/blob/main/python_coreml_stable_diffusion/unet.py#L822-L829).

mfellner avatar Dec 25 '22 09:12 mfellner