stable-dreamfusion ControlNet

The pose of generated characters can't currently be controlled in a fine-grained way. Adding this would allow for 3D animation to be generated as well.

Describe the solution you'd like Implement ControlNet - specifically, the pose and segmentation models.

Describe alternatives you've considered Do some sort of image-to-image (or model-to-model) thing

Feb 18 '23 18:02 neverix

@neverix Yes, this is interesting, I guess the pose model can be used to synthesize better human by initializing a skeleton and rendering it from different views as a prior. However, it is quite task-specific and I may try it later in other projects.

Feb 20 '23 02:02 ashawkey

@ashawkey does dreamfusion have way to use img2img instead of txt2img? Whereas you could manually create a predfined image with controlnet which turns objects all the directions, and then feed that to dreamfusion.

Feb 20 '23 10:02 ForceConstant

@neverix Yes, this is interesting, I guess the pose model can be used to synthesize better human by initializing a skeleton and rendering it from different views as a prior. However, it is quite task-specific and I may try it later in other projects.

The basic approach of rendering a colored mesh would also be useful for rendering from segmentation or applying img2img. If I implemented it I would add a hook users can add that functionality into

Feb 20 '23 10:02 neverix

@ashawkey I just wanted to add how big a deal something like this would be, being able to generate real 3d characters and models using Stable Diffusion would be huge.

Feb 20 '23 10:02 ForceConstant

It's already possible, just not using ControlNet

Feb 20 '23 11:02 neverix

the biggest problem with generating good results is that stable-diffusion does not output the same object with different prompts.

If i get it right, there are words like "front view" ,"side view", etc. added to the text-prompt to generate images from other views. But if you run e.g. python nerf/sd.py --seed 100 "armchair front view" and "armchair side view", you see that you get two totally different armchair pictures. Also you can see, that most pictures with the tag "side view" does not show the side-view of the object, a front-view mostly instead.

i could be wrong

Feb 20 '23 12:02 flobotics

the biggest problem with generating good results is that stable-diffusion does not output the same object with different prompts.

If i get it right, there are words like "front view" ,"side view", etc. added to the text-prompt to generate images from other views. But if you run e.g. python nerf/sd.py --seed 100 "armchair front view" and "armchair side view", you see that you get two totally different armchair pictures. Also you can see, that most pictures with the tag "side view" does not show the side-view of the object, a front-view mostly instead.

i could be wrong

Yes, ControlNet grounding could potentially help solve this. But the signal is already quite good and there are other ways to make progress on this

Feb 20 '23 12:02 neverix