Dreambooth-Stable-Diffusion
Dreambooth-Stable-Diffusion copied to clipboard
What is Midjourney's secret? Anyone knows?
I've seen a lot of digital art from both Midjourney and the rest of the A.I. folks. and I've seen that there's an artistic depth to Midjourney that the others simply don't have. Sure, here and there I will find one creation that appears to be outstanding, and yet they are way fewer than what one can find in Midjourney. Here's an example of the same prompt (carbon, hydrogen, oxygen, nitrogen, sculpture, abstract, octane rendering, hyper extremism) I used this prompt to represent the very basic elements giving rise to the origin of the universe, of life. As you can see, I needed an abstract image with enough depth that could represent this idea. Here are a couple of examples I got from SD and Midjourney. Guess which one was made by MJ?


It was suggested to me to include the prompts: DOF, strong bokeh, sampler dpm2. scale:7, steps: 61 and yet I cannot find one single result that can truly compete with MJ.
I understand that MJ uses SD, but why is it so far superior? Was MJ trained with paintings by famous artists? Is that the reason?
Emad discusses that here: https://old.reddit.com/r/StableDiffusion/comments/x9xqap/ama_emad_here_hello/inqj7dy/
They do prompt editing on the way in and post processing on the way out basically.
Using the same prompt between images isn't a good idea as they generate images in different ways so its not a like for like test regardless one of the forms of post processing we know mj does is clip guidance which SD doesn't do mj also has rich data from users and ratings and can use that to improve things even further Emad also mentioned in a Center Stage that also do "something else" but didnt say exactly what
In short stable diffusion is a raw unprocessed output whereas mj does alot of pre and post processing which is why you can have different results in this way
My theory (without much evidence) is that they built a pre-processor model that works on the embedding of the prompt. So rather than just appending "digital art, Greg Rutkowski" it takes the tensor of your prompt and tries to steer it towards prompt embeddings that got highly rated by their community. In addition Emad did say they're using CLIP guidance and he probably knows. You could also have other guidance models alongside CLIP guiding composition and style, and that could achieve a lot. If that's not what they're doing then this could be an interesting idea for someone else to try... 😅
Its just some well made custom embeddings, along with things like GFP to fix faces etc.
Its like a very well done google colab notebook, but you pay for it.