mental-diffusion icon indicating copy to clipboard operation
mental-diffusion copied to clipboard

Stable diffusion command-line interface

Mental Diffusion

Stable diffusion command-line interface
Powered by Diffusers

Version 0.7.5 alpha
Torch 2.2.2 +cu121

ComfyUI Bridge for VS Code

  • Command-line interface
  • Websockets server
  • Websockets client Electron
  • SD 1.5, SDXL, SDXL-Turbo
  • VAE, TAESD, LoRA
  • Text-to-Image, Image-to-Image, Inpaint
  • Latent preview for SD/SDXL (bmp/webp)
  • Upscaler Real-ESRGAN x2/x4/anime
  • Read and write PNG with metadata
  • Optimized for low specs
  • Support CPU and GPU

Installation

  • Install Python 3.11.x
  • Install Python packages (see installer.py or requirements.txt)
  • Install Electron
git clone https://github.com/nimadez/mental-diffusion.git
edit src/config.json

Start server

cd mental-diffusion
python src/mdx.py -serv 8011

Start client

cd mental-diffusion
electron src/client/.

Start headless

python mdx.py -p "prompt" -c /sd.safetensors -st 20 -g 7.5 -f img_{seed}
python mdx.py -p "prompt" -mode xl -c /sdxl.safetensors -w 1024 -h 1024 -st 30 -g 8.0 -f img_{seed}
python mdx.py -p "prompt" -pipe img2img -i image.png -sr 0.5
python mdx.py -p "prompt" -pipe inpaint -i image.png -m mask.png
These models are downloaded as needed after launch:
openai/clip-vit-large-patch14 (1.6 GB)
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k (10 MB)
madebyollin/taesd (20 MB)
madebyollin/taesdxl (20 MB)
RealESRGAN_x2plus.pth (65 MB, optional)
RealESRGAN_x4plus.pth (65 MB, optional)
RealESRGAN_x4plus_anime_6B.pth (20 MB, optional)

Command-line

--help                     show this help message and exit

--server     -serv  int    start websockets server (port is required)
--metadata   -meta  str    /path-to-image.png, extract metadata from PNG

--model      -mode  str    sd/xl, set checkpoint model type (def: config.json)
--pipeline   -pipe  str    txt2img/img2img/inpaint, define pipeline (def: txt2img)
--checkpoint -c     str    checkpoint .safetensors path (def: config.json)
--vae        -v     str    optional vae .safetensors path (def: null)
--lora       -l     str    optional lora .safetensors path (def: null)
--lorascale  -ls    float  0.0-1.0, lora scale (def: 1.0)
--scheduler  -sc    str    ddim, ddpm, lcm, pndm, euler_anc, euler, lms (def: config.json)
--prompt     -p     str    positive prompt text input (def: sample)
--negative   -n     str    negative prompt text input (def: empty)
--width      -w     int    width value must be divisible by 8 (def: config.json)
--height     -h     int    height value must be divisible by 8 (def: config.json)
--seed       -s     int    seed number, -1 to randomize (def: -1)
--steps      -st    int    steps from 1 to 100+ (def: 25)
--guidance   -g     float  0.0-20.0+, how closely linked to the prompt (def: 8.0)
--strength   -sr    float  0.0-1.0, how much respect the image should pay to the original (def: 1.0)
--image      -i     str    PNG file path or base64 PNG (def: null)
--mask       -m     str    PNG file path or base64 PNG (def: null)
--savefile   -sv    bool   true/false, save image to PNG, contain metadata (def: true)
--onefile    -of    bool   true/false, save the final result only (def: false)
--outpath    -o     str    /path-to-directory (def: .output)
--filename   -f     str    filename prefix (no png extension)
--batch      -b     int    enter number of repeats to run in batch (def: 1)
--preview    -pv    bool   stepping is slower with preview enabled (def: false)
--upscale    -up    str    x2, x4, x4anime (def: null)

[model settings]
SDXL: 1024x1024 or 768x768, --steps >20
SDXL-Turbo: 512x512, --steps 1-4, --guidance 0.0
LCM-LoRA: "lcm" scheduler, --steps 2-8+, --guidance 0.0-2.0

* --server or --batch is recommended because there is no need to reload the checkpoint
* Add "{seed}" to --filename, which will be replaced by seed later
* To load SDXL on 3.5 GB, you need at least 16 GB memory and virtual-memory paging

Websockets Client

- Client fetch config data from server on first launch only (refresh to update)
- You can continue the preview image with img2img pipeline
- The last replay is always available and saveable in .webp format
- Prompts, outpath and filename are saved and retrieved on refresh
- The upscaled image is saved to the file and is not returned
- Right click on the images to open popup menu

Latent Preview

Test LoRA + VAE


* Juggernaut Aftermath, TRCVAE, World of Origami

Test SDXL


* OpenDalleV1.1

Test SDXL-Turbo


* A cinematic shot of a baby racoon wearing an intricate italian priest robe.

Known Issues

:: Stepping is slower with preview enabled
We used the BMP format which has no compression.
Reminder: one solution is to set "pipe._guidance_scale" to 0.0 after 40%

:: Interrupt button does not work
You have to wait for the current step to finish,
The interrupt operation is applied at the beginning of each step,
CTRL+C terminate the server, and as a result, the interrupt button will no longer work.

:: Connection timeout error on loading checkpoint
Your access to HuggingFace is restricted by ISP
Set "http_proxy" in config.json (e.g. "127.0.0.1:8118")

History

↑ Back to the roots (diffusers)
↑ Ported to VS Code
↑ Switch from Diffusers to ComfyUI
↑ Upgrade from sdkit to Diffusers
↑ Undiff renamed to Mental Diffusion
↑ Undiff started with "sdkit"
↑ Created for my personal use

License

Code released under the MIT license.

Credits