stable-diffusion.cpp Very simple http server

This is a very simple server that I made to be able to generate different prompts without reloading the models everytime.

Edit (03/01/25):

Now with a basic GUI (/index.html endpoint for now). The API has changed a bit, so the rest of the old examples are outdated , I'll fix it later.

Mostly outdated instructions

Starting the server

The syntax is pretty much the same as the cli.

.\build\bin\Release\sd-server.exe --diffusion-model  ..\ComfyUI\models\unet\flux1-schnell-Q3_k.gguf --vae ..\ComfyUI\models\vae\ae.q8_0.gguf --clip_l ..\ComfyUI\models\clip\clip_l.q8_0.gguf --t5xxl ..\ComfyUI\models\clip\t5xxl_q4_k.gguf  -p "Default prompt" --cfg-scale 1.0 --sampling-method euler -v --steps 4 -o "server_output.png"

How to use (example):

Using the example client script

Make sure you have python with the modules requests and pilloware installed pip install requests pillow
Launch client in interactive mode python -i examples/server/test_client.py

Simplest setup

Make sure you have python installed with the requests module: pip install requests
Open a python REPL: python
Import the requests module >>> import requests
Post your prompt directly to the /txt2imgendpoints >>> requests.post("http://localhost:8080/txt2img","a lovely cat holding a sign says 'flux.cpp'")
Images will be saved to disk on the server side, and each generation will overwrite the previous one.

Using json payloads

Make sure you have python installed with the requests module: pip install requests
Open a python REPL: python
Import the requests and json modules >>> import requests, json
Construct your json payload with generation parameters >>> payload = {'prompt': """a lovely cat holding a sign says "flux.cpp" """,'height': 768, 'seed': 42, 'sample_steps': 4}
Post your payload to the /txt2imgendpoints >>> requests.post("http://localhost:8080/txt2img", json.dumps(payload))

Decoding response using pillow

Make sure both requests and pilloware installed pip install requests pillow
Open a python REPL: python
Import the requests, json and base64 modules >>> import requests, json, base64
Import io.BytesIO and PIL.Image >>> from io import BytesIO >>> from PIL import Image
Get the response from server >>> response = requests.post("http://localhost:8080/txt2img","a lovely cat holding a sign says 'flux.cpp'")
Parse the response text as Json >>> parsed = json.loads(response.text)
Decode base64 image data >>> pngbytes = base64.b64decode(parsed[0]["data"])
Convert to PIL Image >>> image = Image.open(BytesIO(pngbytes))
Display the image in default viewer >>> image.show()

One-liner

First import the necessary modules >>> import requests, json, base64 >>> from io import BytesIO >>> from PIL import Image
Use this line to send the request and open all the generated images. >>> [Image.open(BytesIO(base64.b64decode(img["data"]))).show() for img in json.loads(requests.post("http://localhost:8080/txt2img",json.dumps( {'seed': -1, 'batch_count':4, 'sample_steps':4, 'prompt': """a lovely cat holding a sign says "flux.cpp" """} )).text)]
To send another payload after it's finished, press up arrow and you can edit the payload.

If you don't want the image viewer to pause the execution of your command, you can do the following (not needed on macOS for some reason): >>> from threading import Thread >>> [Thread(target=Image.open(BytesIO(base64.b64decode(img["data"]))).show, args=()).start() for img in json.loads(requests.post("http://localhost:8080/txt2img",json.dumps( {'seed': -1, 'batch_count':4, 'sample_steps':4, 'prompt': """a lovely cat holding a sign says "flux.cpp" """} )).text)]

Aug 25 '24 22:08 stduhpf

I'm excited about this one, and was attempting to combine with Vulkan

I'm seeing a compile time issue (around the pingpong function) in my merge, and seems it's in the original as well.

stable-diffusion.cpp/examples/server/main.cpp:572:24: error: non-local lambda expression cannot have a capture-default
  572 | const auto pingpong = [&](const httplib::Request &, httplib::Response & res) {
      |                        ^
/home/aerotoad/software/aicpp/sd_vulkan_flux/server/stable-diffusion.cpp/examples/server/main.cpp: In lambda function:
/home/aerotoad/software/aicpp/sd_vulkan_flux/server/stable-diffusion.cpp/examples/server/main.cpp:672:5: warning: control reaches end of non-void function [-Wreturn-type]
  672 |     };

Aug 26 '24 22:08 theaerotoad

around the pingpong function

Ah! This function should go, I just added it at the start of devlopment to see if I was able to connect to the server. If it's causing issues, just remove it, and all the few things that depend on it.

Aug 26 '24 22:08 stduhpf

@theaerotoad just out of curiosity, which C++ compiler are you using? MSVC had no issue with this code (which I believe was technically incorrect).

Aug 26 '24 22:08 stduhpf

@theaerotoad just out of curiosity, which C++ compiler are you using? MSVC had no issue with this code (which I believe was technically incorrect).

Tested it on gcc 12.2.0-14 on Debian.

Aug 26 '24 22:08 theaerotoad

Yup, removing the pingpong endpoint allows compilation.

Another thought--the default 'localhost' string didn't work on my end initially. Looks like llama.cpp server defaults to using 127.0.0.1 instead of 'localhost', so it might be worth setting the default string that way. Not a big deal. though.

I was able to generate an image via requests, but segfault immediately afterwards.

[DEBUG] ggml_extend.hpp:977  - flux compute buffer size: 397.27 MB(RAM)
  |==================================================| 4/4 - 79.22s/it
[INFO ] stable-diffusion.cpp:1295 - sampling completed, taking 316.44s
[INFO ] stable-diffusion.cpp:1303 - generating 1 latent images completed, taking 316.44s
[INFO ] stable-diffusion.cpp:1306 - decoding 1 latents
[DEBUG] ggml_extend.hpp:977  - vae compute buffer size: 1664.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:967  - computing vae [mode: DECODE] graph completed, taking 14.59s
[INFO ] stable-diffusion.cpp:1316 - latent 1 decoded, taking 14.59s
[INFO ] stable-diffusion.cpp:1320 - decode_first_stage completed, taking 14.59s
[INFO ] stable-diffusion.cpp:1429 - txt2img completed in 341.62s
save result image to 'server_output.png'
Segmentation fault

I've played around a bit (not much of a c++ coder at this point, and can't reliably track down where it's coming from, though. I'm running with batch 1 (so only one image), and the first image gets written properly, with tags, then the dreaded segfault.

Aug 27 '24 00:08 theaerotoad

Maybe you could try on the CPU backend to see if the segfault is related to the Vulkan merge or to the server itself? (Also you should probably use a less demanding model than flux when testing)

Aug 27 '24 00:08 stduhpf

Maybe you could try on the CPU backend to see if the segfault is related to the Vulkan merge or to the server itself? (Also you should probably use a less demanding model than flux when testing)

Right--should have said I ran the earlier example with the CPU backend (tried with no BLAS just to confirm it wasn't in my merging it over that caused this!) It's much faster with Vulkan.

I can confirm I seem to throw a segfault with the server everytime with:

CPU (no BLAS), from server branch with SDXL
CPU (no BLAS), from server branch with Flux Schnell, q8 quants
Vulkan, merged into SkuttleOleg's with Flux Schnell and q8 quants

For each of the above, they run fine with the main cli example (although painfully slowly on CPU).

Aug 27 '24 01:08 theaerotoad

Hmm it doesn't happen on my machine, that's annoying to debug. I'll try running it on WSL to see if it's a linux thing.

Edit: It does happen on WSL too! So maybe i can fix it.

Aug 27 '24 09:08 stduhpf

@theaerotoad I belive it's fixed now.

Aug 27 '24 09:08 stduhpf

@stduhpf Yup, that fixes it. Thank you!

Sure nice not to have to reload everything each time.

Aug 27 '24 13:08 theaerotoad

@stduhpf -- This is working pretty well, I played around with it a bit this weekend. I have a few tweaks, to enable other inputs to be specified (via html form inputs) and returning the image as part of the POST command, and reduce CPU usage--use t.join() rather than while(1) at the end.

Do you want them? I may just share as a gist, or can branch off your repo. What's your preference?

Sep 03 '24 23:09 theaerotoad

@theaerotoad Both options are fine with me, thanks for helping.

I thought about returning the image in base64 after each generation, but I was too lazy to implement it.

Sep 03 '24 23:09 stduhpf

I just spent hours trying to understand why the server wasn't sending the image metadata as it is supposed to, turns out PIL automatically strips out the metadata, the server was working fine 🙃.

Oct 05 '24 14:10 stduhpf

There are some differences to the automatic111 v1 webui api. You use

sampling_steps instead of steps
batch_count instead of batch_size
sample_method instead of sampler_index

This info however might be outdated, I just wanted to make my bot work with your api, so this just jumped at me. We should look into what the other api's do (automatic and ComfyUI), and base it on that, to not make it incompatible unnecessarily.

edit: links: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/API

Oct 06 '24 11:10 Green-Sky

There are some differences to the automatic111 v1 webui api. You use

sampling_steps instead of steps

batch_count instead of batch_size

sample_method instead of sampler_index

This info however might be outdated, I just wanted to make my bot work with your api, so this just jumped at me. We should look into what the other api's do (automatic and ComfyUI), and base it on that, to not make it incompatible unnecessarily.

I might look into making the API compatible with other standards in the future. For now, I just use the same arguments as the txt2img() function declaration in stable-diffusion.h : https://github.com/leejet/stable-diffusion.cpp/blob/14206fd48832ab600d9db75f15acb5062ae2c296/stable-diffusion.h#L148-L164

Oct 06 '24 11:10 stduhpf

Speaking of, shouldn't the schedule method be specified when calling txt2img() rather than when creating the context?

Oct 06 '24 11:10 stduhpf

I might look into making the API compatible with other standards in the future. For now, I just use the same arguments as the txt2img() function declaration in stable-diffusion.h

I see.

Speaking of, shouldn't the schedule method be specified when calling txt2img() rather than when creating the context?

I suppose.

Oct 06 '24 11:10 Green-Sky

If anyone just wants to run a command:

curl -sv --json '{"prompt": "a lovely cat", "seed": -1}' 127.0.0.1:7860/txt2img | jq -r .[0].data | base64 -d - > api_result.png

Oct 06 '24 12:10 Green-Sky

@stduhpf thanks for your work. Currently I'm using this pr for photomaker v2 but have error when change the input embedding (I believed it called "input_id_images_path"). How to input the different face without reload the whole SDXL model or some function to reload the face embedding ?

Oct 22 '24 14:10 NNDam

@NNDam You can try with my lastest commit. I can't test it on my end, but it should work now?

Oct 22 '24 15:10 stduhpf

@stduhpf thanks, I tried but still not work. The main problem is, at the first time load model, I also need to preload --input-id-images-dir extracted from script face_detect.py in this PR. But the embedding won't reload if I change the input_id_images_path when perform requests to server. It still output the face same with preloaded face at first time (and also Segmentation fault if the number of the current face differ with number of preloaded face)

Oct 22 '24 16:10 NNDam

@stduhpf thanks, I tried but still not work. The main problem is, at the first time load model, I also need to preload --input-id-images-dir extracted from script face_detect.py in this PR. But the embedding won't reload if I change the input_id_images_path when perform requests to server. It still output the face same with preloaded face at first time (and also Segmentation fault if the number of the current face differ with number of preloaded face)

Oh I see. Well, even if Support for PhotoMaker Version 2 was merged, I couldn't get this to work with the current architecture of the server, sorry. Have you tried with photomaker v1?

Oct 22 '24 17:10 stduhpf

Hi @bssrdf, can you help us ?

Oct 23 '24 02:10 NNDam

Hi @bssrdf, can you help us ?

@NNDam, I'll see what can be done to make it work. Photomaker was developed following control net's workflow. It needs to be adjusted to work with this sever setup.

Oct 23 '24 12:10 bssrdf

I think some changes need to be made in stable-diffusion.cpp/stable-diffusion.h. Some arguments like scheduler type, vae settings, and controlnets are passed to the new_sd_ctx() function that load the models, but they should probably be passed to functions like txt2img(), img2img() and img2vid() instead. That's completely out of scope for this PR though, but it would allow the server to easily support controlnet and photomaker v2.

Oct 23 '24 13:10 stduhpf

@NNDam , @stduhpf , I briefly looked at the server code. There may be a simple workaround for photomaker.

// parse req.body as json using jsoncpp
        using json = nlohmann::json;

        try {
            std::string json_str = req.body;
            parseJsonPrompt(json_str, &params);
        } catch (json::parse_error& e) {
            // assume the request is just a prompt
            // LOG_WARN("Failed to parse json: %s\n Assuming it's just a prompt...\n", e.what());
            sd_log(sd_log_level_t::SD_LOG_WARN, "Failed to parse json: %s\n Assuming it's just a prompt...\n", e.what());
            std::string prompt = req.body;
            if (!prompt.empty()) {
                params.prompt = prompt;
            } else {
                params.seed += 1;
            }
        } catch (...) {
            // Handle any other type of exception
            // LOG_ERROR("An unexpected error occurred\n");
            sd_log(sd_log_level_t::SD_LOG_ERROR, "An unexpected error occurred\n");
        }

Could there be a parsing of input_id_images_path added in above block and set params.input_id_images_path to the new path from the request?

Oct 24 '24 12:10 bssrdf

@bssrdf That's exactly what I did in the last commit (https://github.com/leejet/stable-diffusion.cpp/pull/367/commits/d0704a536bae4904f9133ef0f1076ac8f7c44f0b): https://github.com/stduhpf/stable-diffusion.cpp/blob/d0704a536bae4904f9133ef0f1076ac8f7c44f0b/examples/server/main.cpp#L696. In theory this should work for photomaker v1 support (though I haven't tried it).

But Photomaker v2 support from your PR requires passing params.input_id_images_path as an arument to new_sd_ctx(), instead of just txt2img() .

Oct 24 '24 13:10 stduhpf

@bssrdf That's exactly what I did in the last commit (d0704a5): https://github.com/stduhpf/stable-diffusion.cpp/blob/d0704a536bae4904f9133ef0f1076ac8f7c44f0b/examples/server/main.cpp#L696. In theory this should work for photomaker v1 support (though I haven't tried it).

But Photomaker v2 support from your PR requires passing params.input_id_images_path as an arument to new_sd_ctx(), instead of just txt2img() .

Thanks for the information, @stduhpf. I updated loading id_embeds to using raw binary tensor file load (using load_tensor_from_file). It is more efficient to load this way since there is only 1 tensor. Now it should change/update id_embed based on the request and feed photomaker V2. @NNDam, please retry my PR and let me know if there is still problem.

Oct 25 '24 16:10 bssrdf

It worked !!! Thanks @bssrdf @stduhpf

Nov 08 '24 04:11 NNDam

I'm interested with the server mode because I use sd.cpp to create img2img video and I need to reload the model each time

https://github.com/user-attachments/assets/05d974bf-af68-4397-9d98-d02f539d044b

Apr 17 '25 17:04 ServeurpersoCom

stable-diffusion.cpp stable-diffusion.cpp copied to clipboard

Very simple http server

Starting the server

How to use (example):

Using the example client script

Simplest setup

Using json payloads

Decoding response using pillow

One-liner

stable-diffusion.cpp
stable-diffusion.cpp copied to clipboard