stable-dreamfusion icon indicating copy to clipboard operation
stable-dreamfusion copied to clipboard

ControlNet

Open neverix opened this issue 3 years ago • 11 comments

The pose of generated characters can't currently be controlled in a fine-grained way. Adding this would allow for 3D animation to be generated as well.

Describe the solution you'd like Implement ControlNet - specifically, the pose and segmentation models.

Describe alternatives you've considered Do some sort of image-to-image (or model-to-model) thing

neverix avatar Feb 18 '23 18:02 neverix

@neverix Yes, this is interesting, I guess the pose model can be used to synthesize better human by initializing a skeleton and rendering it from different views as a prior. However, it is quite task-specific and I may try it later in other projects.

ashawkey avatar Feb 20 '23 02:02 ashawkey

@ashawkey does dreamfusion have way to use img2img instead of txt2img? Whereas you could manually create a predfined image with controlnet which turns objects all the directions, and then feed that to dreamfusion.

ForceConstant avatar Feb 20 '23 10:02 ForceConstant

@neverix Yes, this is interesting, I guess the pose model can be used to synthesize better human by initializing a skeleton and rendering it from different views as a prior. However, it is quite task-specific and I may try it later in other projects.

The basic approach of rendering a colored mesh would also be useful for rendering from segmentation or applying img2img. If I implemented it I would add a hook users can add that functionality into

neverix avatar Feb 20 '23 10:02 neverix

@ashawkey I just wanted to add how big a deal something like this would be, being able to generate real 3d characters and models using Stable Diffusion would be huge.

ForceConstant avatar Feb 20 '23 10:02 ForceConstant

It's already possible, just not using ControlNet

neverix avatar Feb 20 '23 11:02 neverix

the biggest problem with generating good results is that stable-diffusion does not output the same object with different prompts.

If i get it right, there are words like "front view" ,"side view", etc. added to the text-prompt to generate images from other views. But if you run e.g. python nerf/sd.py --seed 100 "armchair front view" and "armchair side view", you see that you get two totally different armchair pictures. Also you can see, that most pictures with the tag "side view" does not show the side-view of the object, a front-view mostly instead.

i could be wrong

flobotics avatar Feb 20 '23 12:02 flobotics

the biggest problem with generating good results is that stable-diffusion does not output the same object with different prompts.

If i get it right, there are words like "front view" ,"side view", etc. added to the text-prompt to generate images from other views. But if you run e.g. python nerf/sd.py --seed 100 "armchair front view" and "armchair side view", you see that you get two totally different armchair pictures. Also you can see, that most pictures with the tag "side view" does not show the side-view of the object, a front-view mostly instead.

i could be wrong

Yes, ControlNet grounding could potentially help solve this. But the signal is already quite good and there are other ways to make progress on this

neverix avatar Feb 20 '23 12:02 neverix

It's already possible, just not using ControlNet

How is it already possible?

Preternature avatar Feb 21 '23 21:02 Preternature

It's already possible, just not using ControlNet

How is it already possible?

Well, you are looking at the repo that makes it possible

neverix avatar Feb 22 '23 08:02 neverix

I tried controlnet but get this: df_ep0016_0002_rgb Just like use vanilla nerf, it is likely to be misconvergence

zz7379 avatar Mar 01 '23 15:03 zz7379

I tried controlnet but get this: df_ep0016_0002_rgb Just like use vanilla nerf, it is likely to be misconvergence

Hi, I meet the same problem, do you solve it?

sjx-ali avatar Apr 06 '23 09:04 sjx-ali