ml-stable-diffusion icon indicating copy to clipboard operation
ml-stable-diffusion copied to clipboard

Swift generation produces different style/quality images compared to other SD tools

Open kasima opened this issue 1 year ago • 3 comments

I've been experimenting with image generation in Swift with the converted CoreML models. It seems to produce different style (and noticeably worse?) images than other Stable Diffusion tools for a given model version and set of generation parameters. The python CLI generation with the converted CoreML models seems to produce images that are in the same vicinity as the other tools.

I'm new to AI image generation space and would much appreciate any help with a few questions:

  • Does the last row of images (Swift CLI) seem like it's very different from the other rows to anyone else? It's somewhat subjective.
  • Why would the generation with Swift consistently produce different quality/style/coherence than all the other tools?
  • How can I get the Swift generation with the Neural Engine to produce similar quality images as the other Stable Diffusion tools?

Here's what I've been looking at:

Parameters

  • Model: a version of Stable Diffusion 1.5
  • Prompt: personification of Halloween holiday in the form of a cute girl with short hair and a villain's smile, cute hats, cute cheeks, unreal engine, highly detailed, artgerm digital illustration, woo tooth, studio ghibli, deviantart, sharp focus, artstation, by Alexei Vinogradov bakery, sweets, emerald eyes (Inspired by this image)
  • Steps: 30
  • Guidance: 10
  • Seed: random
Tool
DreamStudio 2572197474_personification_of_Halloween_holiday_in_the_form_of_a_cute_girl_with_short_hair_and_a_villain_s_smil 2981633817_personification_of_Halloween_holiday_in_the_form_of_a_cute_girl_with_short_hair_and_a_villain_s_smil 1465467626_personification_of_Halloween_holiday_in_the_form_of_a_cute_girl_with_short_hair_and_a_villain_s_smil 1985145463_personification_of_Halloween_holiday_in_the_form_of_a_cute_girl_with_short_hair_and_a_villain_s_smil
Google Colab download download (2) download (1) download (3)
DiffusionBee (local) 4 3 2 1
InvokeAI (local) 000037 50a0c616 94 000036 b26068ab 430981462 000035 bdde6fd3 3918484893 000034 15d8177c 93
python CLI (local coreML) randomSeed_96_computeUnit_ALL_modelVersion_runwayml_stable-diffusion-v1-5 randomSeed_95_computeUnit_ALL_modelVersion_runwayml_stable-diffusion-v1-5 randomSeed_94_computeUnit_ALL_modelVersion_runwayml_stable-diffusion-v1-5 randomSeed_93_computeUnit_ALL_modelVersion_runwayml_stable-diffusion-v1-5
swift CLI (local coreML) personification_of_Halloween_holiday_in_the_form_of_a_cute_girl_with_short_ 3 93 gs10 30 personification_of_Halloween_holiday_in_the_form_of_a_cute_girl_with_short_ 2 93 gs10 30 personification_of_Halloween_holiday_in_the_form_of_a_cute_girl_with_short_ 1 93 gs10 30 personification_of_Halloween_holiday_in_the_form_of_a_cute_girl_with_short_ 0 93 gs10 30
Python CLI generation command

pre-converted model from huggingface

python -m python_coreml_stable_diffusion.pipeline --prompt "personification of Halloween holiday in the form of a cute girl with short hair and a villain's smile, cute hats, cute cheeks, unreal engine, highly detailed, artgerm digital illustration, woo tooth, studio ghibli, deviantart, sharp focus, artstation, by Alexei Vinogradov bakery, sweets, emerald eyes" -i /Users/kasima/src/huggingface/apple/coreml-stable-diffusion-v1-5/original/packages -o /Users/kasima/scratch --compute-unit ALL --model-version "runwayml/stable-diffusion-v1-5" --num-inference-steps 30 --guidance-scale 10

Swift CLI generation command

pre-converted model from huggingface

swift run StableDiffusionSample "personification of Halloween holiday in the form of cute girl with short hair and a villain's smile, cute hats, cute cheeks, unreal engine, highly detailed, artgerm digital illustration, woo tooth, studio ghibli, deviantart, sharp focus, artstation, by Alexei Vinogradov bakery, sweets, emerald eyes" --negative-prompt "" --resource-path /Users/kasima/src/huggingface/apple/coreml-stable-diffusion-v1-5/split_einsum/compiled/ --output-path /Users/kasima/scratch/swiftcli/comparison --step-count 30 --guidance-scale 10 --image-count 4

kasima avatar Dec 29 '22 14:12 kasima

@kasima maybe this is a silly question but I see no mention of the seeds used to generate these images, did you use the same to generate each column on these examples? and if you did, could you share them so we can try to replicate this results?

GuiyeC avatar Dec 29 '22 16:12 GuiyeC

The issue is that currently this CoreML implementation only supports 2 samplers. The normal SD tools all support other samplers, like Euler, etc, which I found produce great results.

We'll have to wait until apple implement other samplers in this codebase.

H1p3ri0n avatar Dec 29 '22 20:12 H1p3ri0n

@GuiyeC – All the images were generated with random seeds (updated in the original post). The images in the columns aren't necessarily related to each other. Columns were used for formatting. However, it's an interesting idea to try to keep the seeds the same. Will try that when I get a chance.

@timevision – So the samplers/schedulers might have something to do with it? I believe at least a few of them are using the default PNDM scheduler (Google Colab, Python CLI, Swift CLI, and probably Diffusion Bee as well). Will confirm and try regenerating with the same scheduler.

kasima avatar Dec 30 '22 04:12 kasima