stable-diffusion.cpp
stable-diffusion.cpp copied to clipboard
Very simple http server
This is a very simple server that I made to be able to generate different prompts without reloading the models everytime.
Edit (03/01/25):
Now with a basic GUI (/index.html endpoint for now). The API has changed a bit, so the rest of the old examples are outdated , I'll fix it later.
Mostly outdated instructions
Starting the server
The syntax is pretty much the same as the cli.
.\build\bin\Release\sd-server.exe --diffusion-model ..\ComfyUI\models\unet\flux1-schnell-Q3_k.gguf --vae ..\ComfyUI\models\vae\ae.q8_0.gguf --clip_l ..\ComfyUI\models\clip\clip_l.q8_0.gguf --t5xxl ..\ComfyUI\models\clip\t5xxl_q4_k.gguf -p "Default prompt" --cfg-scale 1.0 --sampling-method euler -v --steps 4 -o "server_output.png"
How to use (example):
Using the example client script
- Make sure you have python with the modules
requestsandpilloware installedpip install requests pillow - Launch client in interactive mode
python -i examples/server/test_client.py
Simplest setup
- Make sure you have python installed with the
requestsmodule:pip install requests - Open a python REPL:
python - Import the requests module
>>> import requests - Post your prompt directly to the
/txt2imgendpoints>>> requests.post("http://localhost:8080/txt2img","a lovely cat holding a sign says 'flux.cpp'") - Images will be saved to disk on the server side, and each generation will overwrite the previous one.
Using json payloads
- Make sure you have python installed with the
requestsmodule:pip install requests - Open a python REPL:
python - Import the requests and json modules
>>> import requests, json - Construct your json payload with generation parameters
>>> payload = {'prompt': """a lovely cat holding a sign says "flux.cpp" """,'height': 768, 'seed': 42, 'sample_steps': 4} - Post your payload to the
/txt2imgendpoints>>> requests.post("http://localhost:8080/txt2img", json.dumps(payload))
Decoding response using pillow
- Make sure both
requestsandpilloware installedpip install requests pillow - Open a python REPL:
python - Import the requests, json and base64 modules
>>> import requests, json, base64 - Import io.BytesIO and PIL.Image
>>> from io import BytesIO>>> from PIL import Image - Get the response from server
>>> response = requests.post("http://localhost:8080/txt2img","a lovely cat holding a sign says 'flux.cpp'") - Parse the response text as Json
>>> parsed = json.loads(response.text) - Decode base64 image data
>>> pngbytes = base64.b64decode(parsed[0]["data"]) - Convert to PIL Image
>>> image = Image.open(BytesIO(pngbytes)) - Display the image in default viewer
>>> image.show()
One-liner
- First import the necessary modules
>>> import requests, json, base64>>> from io import BytesIO>>> from PIL import Image - Use this line to send the request and open all the generated images.
>>> [Image.open(BytesIO(base64.b64decode(img["data"]))).show() for img in json.loads(requests.post("http://localhost:8080/txt2img",json.dumps( {'seed': -1, 'batch_count':4, 'sample_steps':4, 'prompt': """a lovely cat holding a sign says "flux.cpp" """} )).text)] - To send another payload after it's finished, press up arrow and you can edit the payload.
If you don't want the image viewer to pause the execution of your command, you can do the following (not needed on macOS for some reason):
>>> from threading import Thread
>>> [Thread(target=Image.open(BytesIO(base64.b64decode(img["data"]))).show, args=()).start() for img in json.loads(requests.post("http://localhost:8080/txt2img",json.dumps( {'seed': -1, 'batch_count':4, 'sample_steps':4, 'prompt': """a lovely cat holding a sign says "flux.cpp" """} )).text)]
I'm excited about this one, and was attempting to combine with Vulkan
I'm seeing a compile time issue (around the pingpong function) in my merge, and seems it's in the original as well.
stable-diffusion.cpp/examples/server/main.cpp:572:24: error: non-local lambda expression cannot have a capture-default
572 | const auto pingpong = [&](const httplib::Request &, httplib::Response & res) {
| ^
/home/aerotoad/software/aicpp/sd_vulkan_flux/server/stable-diffusion.cpp/examples/server/main.cpp: In lambda function:
/home/aerotoad/software/aicpp/sd_vulkan_flux/server/stable-diffusion.cpp/examples/server/main.cpp:672:5: warning: control reaches end of non-void function [-Wreturn-type]
672 | };
around the pingpong function
Ah! This function should go, I just added it at the start of devlopment to see if I was able to connect to the server. If it's causing issues, just remove it, and all the few things that depend on it.
@theaerotoad just out of curiosity, which C++ compiler are you using? MSVC had no issue with this code (which I believe was technically incorrect).
@theaerotoad just out of curiosity, which C++ compiler are you using? MSVC had no issue with this code (which I believe was technically incorrect).
Tested it on gcc 12.2.0-14 on Debian.
Yup, removing the pingpong endpoint allows compilation.
Another thought--the default 'localhost' string didn't work on my end initially. Looks like llama.cpp server defaults to using 127.0.0.1 instead of 'localhost', so it might be worth setting the default string that way. Not a big deal. though.
I was able to generate an image via requests, but segfault immediately afterwards.
[DEBUG] ggml_extend.hpp:977 - flux compute buffer size: 397.27 MB(RAM)
|==================================================| 4/4 - 79.22s/it
[INFO ] stable-diffusion.cpp:1295 - sampling completed, taking 316.44s
[INFO ] stable-diffusion.cpp:1303 - generating 1 latent images completed, taking 316.44s
[INFO ] stable-diffusion.cpp:1306 - decoding 1 latents
[DEBUG] ggml_extend.hpp:977 - vae compute buffer size: 1664.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:967 - computing vae [mode: DECODE] graph completed, taking 14.59s
[INFO ] stable-diffusion.cpp:1316 - latent 1 decoded, taking 14.59s
[INFO ] stable-diffusion.cpp:1320 - decode_first_stage completed, taking 14.59s
[INFO ] stable-diffusion.cpp:1429 - txt2img completed in 341.62s
save result image to 'server_output.png'
Segmentation fault
I've played around a bit (not much of a c++ coder at this point, and can't reliably track down where it's coming from, though. I'm running with batch 1 (so only one image), and the first image gets written properly, with tags, then the dreaded segfault.
Maybe you could try on the CPU backend to see if the segfault is related to the Vulkan merge or to the server itself? (Also you should probably use a less demanding model than flux when testing)
Maybe you could try on the CPU backend to see if the segfault is related to the Vulkan merge or to the server itself? (Also you should probably use a less demanding model than flux when testing)
Right--should have said I ran the earlier example with the CPU backend (tried with no BLAS just to confirm it wasn't in my merging it over that caused this!) It's much faster with Vulkan.
I can confirm I seem to throw a segfault with the server everytime with:
- CPU (no BLAS), from server branch with SDXL
- CPU (no BLAS), from server branch with Flux Schnell, q8 quants
- Vulkan, merged into SkuttleOleg's with Flux Schnell and q8 quants
For each of the above, they run fine with the main cli example (although painfully slowly on CPU).
Hmm it doesn't happen on my machine, that's annoying to debug. I'll try running it on WSL to see if it's a linux thing.
Edit: It does happen on WSL too! So maybe i can fix it.
@theaerotoad I belive it's fixed now.
@stduhpf Yup, that fixes it. Thank you!
Sure nice not to have to reload everything each time.
@stduhpf -- This is working pretty well, I played around with it a bit this weekend. I have a few tweaks, to enable other inputs to be specified (via html form inputs) and returning the image as part of the POST command, and reduce CPU usage--use t.join() rather than while(1) at the end.
Do you want them? I may just share as a gist, or can branch off your repo. What's your preference?
@theaerotoad Both options are fine with me, thanks for helping.
I thought about returning the image in base64 after each generation, but I was too lazy to implement it.
I just spent hours trying to understand why the server wasn't sending the image metadata as it is supposed to, turns out PIL automatically strips out the metadata, the server was working fine 🙃.
There are some differences to the automatic111 v1 webui api. You use
sampling_stepsinstead ofstepsbatch_countinstead ofbatch_sizesample_methodinstead ofsampler_index
This info however might be outdated, I just wanted to make my bot work with your api, so this just jumped at me. We should look into what the other api's do (automatic and ComfyUI), and base it on that, to not make it incompatible unnecessarily.
edit: links: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/API
There are some differences to the automatic111 v1 webui api. You use
sampling_stepsinstead ofstepsbatch_countinstead ofbatch_sizesample_methodinstead ofsampler_indexThis info however might be outdated, I just wanted to make my bot work with your api, so this just jumped at me. We should look into what the other api's do (automatic and ComfyUI), and base it on that, to not make it incompatible unnecessarily.
I might look into making the API compatible with other standards in the future. For now, I just use the same arguments as the txt2img() function declaration in stable-diffusion.h : https://github.com/leejet/stable-diffusion.cpp/blob/14206fd48832ab600d9db75f15acb5062ae2c296/stable-diffusion.h#L148-L164
Speaking of, shouldn't the schedule method be specified when calling txt2img() rather than when creating the context?
I might look into making the API compatible with other standards in the future. For now, I just use the same arguments as the txt2img() function declaration in stable-diffusion.h
I see.
Speaking of, shouldn't the schedule method be specified when calling txt2img() rather than when creating the context?
I suppose.
If anyone just wants to run a command:
curl -sv --json '{"prompt": "a lovely cat", "seed": -1}' 127.0.0.1:7860/txt2img | jq -r .[0].data | base64 -d - > api_result.png
@stduhpf thanks for your work. Currently I'm using this pr for photomaker v2 but have error when change the input embedding (I believed it called "input_id_images_path"). How to input the different face without reload the whole SDXL model or some function to reload the face embedding ?
@NNDam You can try with my lastest commit. I can't test it on my end, but it should work now?
@stduhpf thanks, I tried but still not work. The main problem is, at the first time load model, I also need to preload --input-id-images-dir extracted from script face_detect.py in this PR. But the embedding won't reload if I change the input_id_images_path when perform requests to server. It still output the face same with preloaded face at first time (and also Segmentation fault if the number of the current face differ with number of preloaded face)
@stduhpf thanks, I tried but still not work. The main problem is, at the first time load model, I also need to preload
--input-id-images-dirextracted from scriptface_detect.pyin this PR. But the embedding won't reload if I change theinput_id_images_pathwhen perform requests to server. It still output the face same with preloaded face at first time (and also Segmentation fault if the number of the current face differ with number of preloaded face)
Oh I see. Well, even if Support for PhotoMaker Version 2 was merged, I couldn't get this to work with the current architecture of the server, sorry. Have you tried with photomaker v1?
Hi @bssrdf, can you help us ?
Hi @bssrdf, can you help us ?
@NNDam, I'll see what can be done to make it work. Photomaker was developed following control net's workflow. It needs to be adjusted to work with this sever setup.
I think some changes need to be made in stable-diffusion.cpp/stable-diffusion.h. Some arguments like scheduler type, vae settings, and controlnets are passed to the new_sd_ctx() function that load the models, but they should probably be passed to functions like txt2img(), img2img() and img2vid() instead.
That's completely out of scope for this PR though, but it would allow the server to easily support controlnet and photomaker v2.
@NNDam , @stduhpf , I briefly looked at the server code. There may be a simple workaround for photomaker.
// parse req.body as json using jsoncpp
using json = nlohmann::json;
try {
std::string json_str = req.body;
parseJsonPrompt(json_str, ¶ms);
} catch (json::parse_error& e) {
// assume the request is just a prompt
// LOG_WARN("Failed to parse json: %s\n Assuming it's just a prompt...\n", e.what());
sd_log(sd_log_level_t::SD_LOG_WARN, "Failed to parse json: %s\n Assuming it's just a prompt...\n", e.what());
std::string prompt = req.body;
if (!prompt.empty()) {
params.prompt = prompt;
} else {
params.seed += 1;
}
} catch (...) {
// Handle any other type of exception
// LOG_ERROR("An unexpected error occurred\n");
sd_log(sd_log_level_t::SD_LOG_ERROR, "An unexpected error occurred\n");
}
Could there be a parsing of input_id_images_path added in above block and set params.input_id_images_path to the new path from the request?
@bssrdf That's exactly what I did in the last commit (https://github.com/leejet/stable-diffusion.cpp/pull/367/commits/d0704a536bae4904f9133ef0f1076ac8f7c44f0b): https://github.com/stduhpf/stable-diffusion.cpp/blob/d0704a536bae4904f9133ef0f1076ac8f7c44f0b/examples/server/main.cpp#L696. In theory this should work for photomaker v1 support (though I haven't tried it).
But Photomaker v2 support from your PR requires passing params.input_id_images_path as an arument to new_sd_ctx(), instead of just txt2img() .
@bssrdf That's exactly what I did in the last commit (d0704a5): https://github.com/stduhpf/stable-diffusion.cpp/blob/d0704a536bae4904f9133ef0f1076ac8f7c44f0b/examples/server/main.cpp#L696. In theory this should work for photomaker v1 support (though I haven't tried it).
But Photomaker v2 support from your PR requires passing
params.input_id_images_pathas an arument tonew_sd_ctx(), instead of justtxt2img().
Thanks for the information, @stduhpf.
I updated loading id_embeds to using raw binary tensor file load (using load_tensor_from_file). It is more efficient to load this way since there is only 1 tensor. Now it should change/update id_embed based on the request and feed photomaker V2. @NNDam, please retry my PR and let me know if there is still problem.
It worked !!! Thanks @bssrdf @stduhpf
I'm interested with the server mode because I use sd.cpp to create img2img video and I need to reload the model each time
https://github.com/user-attachments/assets/05d974bf-af68-4397-9d98-d02f539d044b