nimadez/mental-diffusion: Fast Stable Diffusion CLI +Gradio

Mental Diffusion

Stable diffusion command-line interface
Powered by Diffusers

Version 0.7.5 alpha
Torch 2.2.2 +cu121

Command-line interface
Websockets server
Websockets client Electron
SD 1.5, SDXL, SDXL-Turbo
VAE, TAESD, LoRA
Text-to-Image, Image-to-Image, Inpaint
Latent preview for SD/SDXL (bmp/webp)
Upscaler Real-ESRGAN x2/x4/anime
Read and write PNG with metadata
Optimized for low specs
Support CPU and GPU

Installation

Install Python 3.11.x
Install Python packages (see installer.py or requirements.txt)
Install Electron

git clone https://github.com/nimadez/mental-diffusion.git
edit src/config.json

Start server

cd mental-diffusion
python src/mdx.py -serv 8011

Start client

cd mental-diffusion
electron src/client/.

Start headless

python mdx.py -p "prompt" -c /sd.safetensors -st 20 -g 7.5 -f img_{seed}
python mdx.py -p "prompt" -mode xl -c /sdxl.safetensors -w 1024 -h 1024 -st 30 -g 8.0 -f img_{seed}
python mdx.py -p "prompt" -pipe img2img -i image.png -sr 0.5
python mdx.py -p "prompt" -pipe inpaint -i image.png -m mask.png

These models are downloaded as needed after launch:

openai/clip-vit-large-patch14 (1.6 GB)
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k (10 MB)
madebyollin/taesd (20 MB)
madebyollin/taesdxl (20 MB)
RealESRGAN_x2plus.pth (65 MB, optional)
RealESRGAN_x4plus.pth (65 MB, optional)
RealESRGAN_x4plus_anime_6B.pth (20 MB, optional)

Command-line

--help                     show this help message and exit

--server     -serv  int    start websockets server (port is required)
--metadata   -meta  str    /path-to-image.png, extract metadata from PNG

--model      -mode  str    sd/xl, set checkpoint model type (def: config.json)
--pipeline   -pipe  str    txt2img/img2img/inpaint, define pipeline (def: txt2img)
--checkpoint -c     str    checkpoint .safetensors path (def: config.json)
--vae        -v     str    optional vae .safetensors path (def: null)
--lora       -l     str    optional lora .safetensors path (def: null)
--lorascale  -ls    float  0.0-1.0, lora scale (def: 1.0)
--scheduler  -sc    str    ddim, ddpm, lcm, pndm, euler_anc, euler, lms (def: config.json)
--prompt     -p     str    positive prompt text input (def: sample)
--negative   -n     str    negative prompt text input (def: empty)
--width      -w     int    width value must be divisible by 8 (def: config.json)
--height     -h     int    height value must be divisible by 8 (def: config.json)
--seed       -s     int    seed number, -1 to randomize (def: -1)
--steps      -st    int    steps from 1 to 100+ (def: 25)
--guidance   -g     float  0.0-20.0+, how closely linked to the prompt (def: 8.0)
--strength   -sr    float  0.0-1.0, how much respect the image should pay to the original (def: 1.0)
--image      -i     str    PNG file path or base64 PNG (def: null)
--mask       -m     str    PNG file path or base64 PNG (def: null)
--savefile   -sv    bool   true/false, save image to PNG, contain metadata (def: true)
--onefile    -of    bool   true/false, save the final result only (def: false)
--outpath    -o     str    /path-to-directory (def: .output)
--filename   -f     str    filename prefix (no png extension)
--batch      -b     int    enter number of repeats to run in batch (def: 1)
--preview    -pv    bool   stepping is slower with preview enabled (def: false)
--upscale    -up    str    x2, x4, x4anime (def: null)

[model settings]
SDXL: 1024x1024 or 768x768, --steps >20
SDXL-Turbo: 512x512, --steps 1-4, --guidance 0.0
LCM-LoRA: "lcm" scheduler, --steps 2-8+, --guidance 0.0-2.0

* --server or --batch is recommended because there is no need to reload the checkpoint
* Add "{seed}" to --filename, which will be replaced by seed later
* To load SDXL on 3.5 GB, you need at least 16 GB memory and virtual-memory paging

Websockets Client

- Client fetch config data from server on first launch only (refresh to update)
- You can continue the preview image with img2img pipeline
- The last replay is always available and saveable in .webp format
- Prompts, outpath and filename are saved and retrieved on refresh
- The upscaled image is saved to the file and is not returned
- Right click on the images to open popup menu

Latent Preview

Test LoRA + VAE

_{* Juggernaut Aftermath, TRCVAE, World of Origami}

Test SDXL

_{* OpenDalleV1.1}

Test SDXL-Turbo

_{* A cinematic shot of a baby racoon wearing an intricate italian priest robe.}

Known Issues

:: Stepping is slower with preview enabled
We used the BMP format which has no compression.
Reminder: one solution is to set "pipe._guidance_scale" to 0.0 after 40%

:: Interrupt button does not work
You have to wait for the current step to finish,
The interrupt operation is applied at the beginning of each step,
CTRL+C terminate the server, and as a result, the interrupt button will no longer work.

:: Connection timeout error on loading checkpoint
Your access to HuggingFace is restricted by ISP
Set "http_proxy" in config.json (e.g. "127.0.0.1:8118")

History

↑ Back to the roots (diffusers)
↑ Ported to VS Code
↑ Switch from Diffusers to ComfyUI
↑ Upgrade from sdkit to Diffusers
↑ Undiff renamed to Mental Diffusion
↑ Undiff started with "sdkit"
↑ Created for my personal use

License

Code released under the MIT license.

mental-diffusion
mental-diffusion copied to clipboard

Metadata

Mental Diffusion

Installation

Start server

Start client

Start headless

These models are downloaded as needed after launch:

Command-line

Websockets Client

Latent Preview

Test LoRA + VAE

Test SDXL

Test SDXL-Turbo

Known Issues

History

License

Credits

← Metadata

Owner

Metadata

mental-diffusion mental-diffusion copied to clipboard

Metadata

Mental Diffusion

Installation

Start server

Start client

Start headless

These models are downloaded as needed after launch:

Command-line

Websockets Client

Latent Preview

Test LoRA + VAE

Test SDXL

Test SDXL-Turbo

Known Issues

History

License

Credits

← Metadata

Owner

Metadata

mental-diffusion
mental-diffusion copied to clipboard