sd-webui-controlnet icon indicating copy to clipboard operation
sd-webui-controlnet copied to clipboard

[Feature Request] Integrated openpose editor

Open huchenlei opened this issue 1 year ago • 7 comments

Often times the openpose preprocessor cannot produce the exact openpose we want. Here are some experience I have:

  • 2 hands in the image, only 1 hand correctly detected
  • The major pose is correctly detected, but some anchor points are in wield positions (this often happens in non-fullbody images)
  • Animie images are not supported for preprocessing.

I propose integrate an openpose editor into controlnet extension. Here are some existing works:

The first 2 extensions copy controlnet's annotator/openpose directory to detect pose from image. Non of them has updated the code to support hand/face detection yet. Both extension supports sending the image to be used by controlnet.

The 3D editor feels too complicated to use, as moving an anchor point in 3D is way more complicated than in 2D.

Proposed workflow:

  • An Edit Pose button is shown below the generated image when any openpose model is selected as preprocessor.
  • Clicking the Edit Pose button will send the JSON openpose data (Stored somewhere on server side when running the preprocessor?) to the openpose editor (A modal?)
  • The user does the necessary edits (Adding missing hands, skeletons, torsoes, etc), close the modal, the JSON openpose data is send back to replace the original JSON openpose data on server side.
  • The server side renders the processed image again.

I am not so sure what is the best way to hijack the openpose JSON data on server side. Please provide some ideas, thanks!

huchenlei avatar Apr 24 '23 04:04 huchenlei

I couldn't agree with you more, I have the same idea and I'm looking at the code about openpose , but I found it was not easy for me :(

CharlesTHN avatar Apr 24 '23 10:04 CharlesTHN

There is also a openpose editor for hand specifically: https://github.com/zackhxn/openpose-hand-editor It also ports controlnet's code, but for hand detection.

Currently we live in a very chaos space where there are multiple extensions available, but none can do the task very well.

huchenlei avatar Apr 24 '23 15:04 huchenlei

openpose editor requires a javascript expert, where I am unfortunately not. I believe we need a JS expert to do this for us.

continue-revolution avatar Apr 24 '23 17:04 continue-revolution

I can help with the JS code. The JS implementations in the above extensions except the 3D extension are pretty minimal (<1k lines of JS).

huchenlei avatar Apr 24 '23 18:04 huchenlei

I kinda sort out the data flow to achieve the functionality:

Build a separate sd-webui extension

The extension will use the sd-webui api to expose a FastAPI path:

def mount_openpose_api(_: gr.Blocks, app: FastAPI):
    @app.post('/openpose_editor', response_class=HTMLResponse)
    async def index():
        return templates.TemplateResponse('index.html', {"request": request"})

script_callbacks.on_app_started(mount_openpose_api)

Embed the openpose extension's page into controlnet

  • Let controlnet display an iframe to the /openpose_editor when the edit button is clicked. Both original image and the openpose json data are send to the iframe as POST request parameters.
  • The user does the pose edit in the iframe sending the processed openpose json data through window.parent.postMessage.
window.parent.postMessage("Data from the child iframe!", "*");
  • Controlnet receives the new openpose json data by observing the message event.
window.addEventListener("message", function (event) {
    console.log("Message received from the child iframe:", event.data);
    // Simulate a click event on a button, so that ControlNet's python backend can receive the event. data is passed through
    // gr.State.
    state.textContent = event.data;
    button.click();
});

Conclusion

By doing this we do not need to host the code of openpose editor within controlnet. User can also choose any openpose editor that support the controlnet's message protocol.

Alternative

Web UI uses very clumsy pure Gradio way to send data to each components link. I suspect this would take more effort, and limit the UI interaction to Gradio's scrope. Current proposed approach will allow us using a JavaScript front-end library (Vue.js, etc) to simplify pure front-end interactions.

huchenlei avatar Apr 30 '23 21:04 huchenlei

Building a pure front-end openpose editor here: https://github.com/huchenlei/sd-webui-openpose-editor

huchenlei avatar May 07 '23 04:05 huchenlei

Progress Update

The UI layout is mostly done. Probably can start the controlnet side work soon. Screenshot (111)

huchenlei avatar May 09 '23 04:05 huchenlei